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0. Introduction 

These are notes from a three-lecture mini-course on free probabihty given at 
MSRI in the Fall of 2010 and repeated a year later at Harvard. The lectures were 
aimed at mathematicians and mathematical physicists working in combinatorics, 
probability, and random matrix theory. The first lecture was a staged rediscovery of 
free independence from first principles, the second dealt with the additive calculus 
of free random variables, and the third focused on random matrix models. 

Most of my knowledge of free probability was acquired through informal con- 
versations with my thesis supervisor, Roland Speicher, and while he is an expert 
in the field the same cannot be said for me. These notes reflect my own limited 
understanding and are no substitute for complete and rigorous treatments, such as 
Voiculescu, Dykema and Nica [44], Hiai and Petz [18 , and Nica and Speicher [28] . 
In addition to these sources, the expository articles of Biane 4 , Shlyakhtenko [35] 
and Tao [41] are very informative. 

I would like to thank the organizers of the MSRI semester "Random Matrix 
Theory, Interacting Particle Systems and Integrable Systems" for the opportunity 
to participate as a postdoctoral fellow. Special thanks are owed to Peter Forrester 
for coordinating the corresponding MSRI book series volume in which these notes 
appear. I am also grateful to the participants of the Harvard random matrices 
seminar for their insightful comments and questions. 

I am indebted to Michael LaCroix for making the illustrations which accompany 
these notes. 

1. Lecture One: Discovering the Free World 

1.1. Counting connected graphs. Let m„ denote the number of simple, undi- 
rected graphs on the vertex set [n] = {1, . . . ,11}. We have rn„ = 2(2), since each 
pair of vertices is either connected by an edge or not. A more subtle quantity is the 
number c„ of connected graphs on [n\. The sequence (c„)„>i is listed as A01187 in 
Sloane's Online Encyclopedia of Integer Sequences; its first few terms are 

1, 1, 4, 38, 728, 26 704, 1866 256, .... 

Perhaps surprisingly, there is no closed formula for c„. However, c„ may be un- 
derstood in terms of the transparent sequence m„ in several ways, each of which 
corresponds to a combinatorial decomposition. 

First, we may decompose a graph into two disjoint subgraphs: the connected 
component of a distinguished vertex, say n, and everything else, i.e. the induced 
subgraph on the remaining vertices. Looking at this the other way around, we may 
build a graph as follows. From the vertices 1, . . . , n — 1 we can choose k of these in 
{^^^) ways, and then build an arbitrary graph on these vertices in ruk ways. On 
the remaining n — 1 — fc vertices together with n, we may build a connected graph 
in Cn-k ways. This construction produces different graphs for different values of 
k, since the size of the connected component containing the pivot vertex n will be 
different. Moreover, as k ranges from one to n — 1 we obtain all graphs in this 
fashion. Thus we have 



!^7n-l\ 

k=0 ^ ^ 
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Figure 1 . Thirty eight of sixty four graphs on four vertices are connected, 
or equivalently 

k=l ^ 

While this is not a closed formula, it allows the efficient computation of c„ given 

^1 ; • ■ • ; ^n—l ■ 

A less efficient but ultimately more useful recursion can be obtained by viewing 
a graph as the disjoint union of its connected components. We construct a graph 
by first choosing a partition of the underlying vertex set into disjoint non-empty 
subsets Bi, . . . ,Bk, and then building a connected graph on each of these, which 
can be done in c\Bi\ ■ ■ ■ c\Bt,\ ways. This leads to the formula 



7reP(n) BGtt 
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where the summation is over the set of all partitions of [n]. We can split off the 
term of the sum corresponding to the partition [n] = [n] to obtain the recursion 

&(Tr)>2 

in which we sum over partitions with at least two blocks. 

The above reasoning is applicable much more generally. Suppose that m„ is the 
number of "structures" which can be built on a set of n labelled points, and that c„ 
is the number of "connected structures" on these points of the same type. Then the 
quantities m„ and Cn will satisfy the above (equivalent) relations. This fundamental 
enumerative link between connected and disconnected structures is ubiquitous in 
mathematics and the sciences, see [58. Chapter 5]. Prominent examples come from 
enumerative algebraic geometry [32] , where connected covers of curves are counted 
in terms of all covers, and quantum field theory [lOj . where Feynman diagram sums 
are reduced to summation over connected terms. 

1.2. Cumulants and connectedness. The relationship between connected and 
disconnected structures is well-known to probabilists, albeit from a different point 
of view. In stochastic applications, m„ = m„(X) = E[X"] is the moment sequence 
of a random variable X, and the quantities c„(X) defined by either of the equivalent 
recurrences 

/ — 1\ 

7rGP(n) Be-rr 

are called the cumulants of X. This term was suggested by Harold Hotelling and 
subsequently popularized by Ronald Fisher and John Wishart in an influential 
1932 article [11]. Cumulants were, however, investigated as early as 1889 by the 
Danish mathematician and astronomer Thorvald Nicolai Thiele, who called them 
half-invariants. Thiele introduced the cumulant sequence as a transform of the 
moment sequence defined via the first of the above recurrences, and some years 
later arrived at the equivalent formulation using the second recurrence. The latter 
is now called the moment-cumulant formula. Thiele's contributions to statistics 
and the early theory of cumulants have been detailed by Anders Hald ^iHJ [17] . 

Cumulants are now well-established and frequently encountered in probability 
and statistics, sufficiently so that the first four have been given names: mean, 
variance, skewness, and kurtositQ- The formulas for mean and variance in terms of 
moments are simple and familiar, 

Ci{X)^mi{X) 
C2{X) ^ m2{X) - mi{X f , 
whereas the third and fourth cumulants are more involved. 



In practice, statisticians often define skewness and kurtosis to be the third and fourth cumu- 
lants scaled by a power of the variance. 
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Figure 2. The Gaussian density 
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C3{X) = m^iX) - 3m2(X)mi(X) + 2m^{Xf 

C4(X) = m^[X) - 4to3(X)toi(X) - im^i^X f + 12m2(X)mi(X)2 - ^m^iX f. 

It is not immediately clear why the cumulants of a random variable are of interest. 
If a random variable X is uniquely determined by its moments, then we may think 
of the moment sequence 

(7Tii(X),m2(X), . . . ,m„(X), . . . ) 

as coordinatizing X . Passing from moments to cumulants then amounts to a (poly- 
nomial) change of coordinates. Why is this advantageous? 

As a motivating example, let us compute the cumulant sequence of the most 
important random variable, the standard Gaussian X. The distribution of X has 
density given by the bell curve 



^x(dt) 



1 



^di 



depicted in Figure [5] 

We will now determine the moments of X . Let z be a complex variable, and define 



Mx{z) 



e*>x(dt). 



Since e 2 decays rapidly as \t\ — >■ cx), Mx{z) is a well-defined entire function of z 
whose derivatives can be computed by differentiation under the integral sign. 



M'x{z)^ J te*>x(di), M'i{z) = J t^e'^ fixidt), .... 

R R 

In particular, the n"^ derivative of Mx{z) at z = is 
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R 

SO we have the Maclaurin series expansion 

oo „ 

Mx(z) = Vto„(X)^. 

n'. 

Thus, the integral Mx{z) acts as a generating function for the moments of X. On 
the other hand, this integral may be explicitly evaluated. Completing the square 
in the exponent of the integrand we find that 



Mx{z) = 




whence 



2 °° 

by translation invariance of Lebesgue measure. We conclude that the odd moments 
of X vanish while the even ones are given by the formula 

(2kV 

m2k{X) = ^ = (2fc - 1) • (2fc - 3) 5 • 3 • 1. 

This is the number of partitions of the set [2k] into blocks of size two, also called 
"pairings" : we have 2k — 1 choices for the element to be paired with 1, then 2k — 3 
choices for the element to be paired with the smallest remaining unpaired element, 
etc. Alternatively, we may say that m„(X) is equal to the number of 1-regular 
graphs on n labelled vertices. It now follows from the fundamental link between 
connected and disconnected structures that the cumulant c„(X) is equal to the 
number of connected 1-regular graphs. Consequently, the cumulant sequence of a 
standard Gaussian random variable is simply 



(0,1,0,0,0,...) 

The fact that the universality of the Gaussian distribution is refiected in the 
simplicity of its cumulant sequence signals cumulants as a key concept in probability 
theory. In Thiele's own words [T^ , 

This remarkable proposition has originally led me to prefer the 
half- invariants over every other system of symmetrical functions. 

This sentiment persists amongst modern-day probabilists. To quote Terry Speed 
[321, 

In a sense which it is hard to make precise, all of the important 
aspects of distributions seem to be simpler functions of cumulants 
than of anything else, and they are also the natural tools with which 
transformations of systems of random variables can be studied when 
exact distribution theory is out of the question. 
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1.3. Cumulants and independence. The importance of cumulants stems, ul- 
timately, from their relationship with stochastic independence. Suppose that X 
and Y are a pair of random variables whose moment sequences have been given 
to us by an oracle, and our task is to compute the moments of X + Y. Since 
E[X"y^] = E[X»]E[F''], this can be done using the formula 

mn{X + Y) = f2(l)rnk{X)mn-k{Y), 

k=0 ^ ^ 

which is conceptually clear but computationally inefficient because of its dependence 
on n. For example, if we want to compute mioo(^ + Y) we must evaluate a sum 
with 101 terms, each of which is a product of three factors. Computations with 
independent random variables simplify dramatically if one works with cumulants 
rather than moments. Indeed, Thiele called cumulants "half-invariants" because 

X, Y independent Cn{X + Y) = c„{X) + c„(F) Vn > 1. 

Thanks to this formula, if the cumulant sequences of X and Y arc given, then each 
cumulant of X + Y can be computed simply by adding two numbers. The mantra 
to be remembered is: 

cumulants linearize addition of independent random variables. 

For example, this fact together with the computation we did above yields that the 

sum of two iid standard Gaussians is a Gaussian of variance two. 

In order to precisely understand the relationship between cumulants and inde- 
pendence, we need to extend the relationship between moments and cumulants to 
a relationship between mixed moments and mixed cumulants. Mixed moments arc 
easy to define: given a set of (not necessarily distinct) random variables Xi, . . . , X^, 

m„(Xi,...,X„) :=E[Xi...X„]. 

It is clear that m„(Xi, . . . , is a symmetric, multilinear function of its argu- 
ments. The new notation for mixed moments is related to our old notation for pure 
moments by 

mn{X) =mn{X,. . . ,X), 

which we may keep as a useful shorthand. 

We now define mixed cumulants recursively in terms of mixed moments using 
the natural extension of the moment- cumulant formula: 

m„(Xi,...,X„) = ^ Y[c\B\{Xi:i€B). 

neP{n) Ben 

For example, we have 

m2{Xi,X2) = C2(Xi,X2) +Ci(Xi)ci(X2), 

from which we find that the second mixed cumulant of Xi and X2 is their covariance, 

C2(Xi,X2) = m2iXi,X2) - mi(Xi)m2iX2). 
More generally, the recurrence 



8 



JONATHAN NOVAK WITH ILLUSTRATIONS BY MICHAEL LACROIX 



Cn{Xi,...,Xn) ^mn{Xi,...,Xn)- ^ C|B| (-'^j : « G B) 

■n-GP(n) -BGtt 
b(ir)>2 

facilitates a straightforward inductive proof that c„(Xi, . . . , X„) is a symmetric, 
n-hnear function of its arguments, which explains Thiele's reference to cumulants 
as his preferred system of symmetric functions. 

The fundamental relationship between cumulants and stochastic independence 
is the following: X and Y are independent if and only if all their mixed cumulants 
vanish, 

C2(x,y) = 

C3{X,X,Y)^C3iX,Y,Y) = 

aiX, X, X, Y) = c^X, X, Y, Y) = aiX, Y, Y,Y) = 



The forward direction of this theorem, 

X, Y independent =4> mixed cumulants vanish, 
immediately yields Thiele's linearization property, since by multilinearity we have 

CniX + Y) = c„{X + Y,...,X + Y) 

— Cn{X, . . . , X) + mixed cumulants + c„ {Y, . . . ,Y) 

= Cn{X) + CniY). 

Conversely, let X,Yhe& pair of random variables whose mixed cumulants vanish. 
Let us check in a couple of concrete cases that this condition forces X and Y to 
obey the algebraic identities associated with independent random variables. In the 
first non-trivial case, n — 2, vanishing of mixed cumulants reduces the extended 
moment-cumulant formula to 

m2{X,Y) =. ci(X)ci(r) = mi(X)mi(r), 

which is consistent with the factorization rule E[Xy] = E[X]E[F] for independent 
random variables. Now let us try an n = 4 example. We compute m4{X, X,Y,Y) 
directly from the extended moment cumulant formula. Referring to Figure [H we 
find that vanishing of mixed cumulants implies 

rrniX, X, Y, Y) = C2{X, X)c2{Y, Y) + C2(X, X)c^ (Y)c^ {Y) + c^iY, Y)c^ (X)ci {X) 

+ ci(x)ci(x)ci(r)ci(r), 

which reduces to the factorization identity E[X2r2] ^ E[X2]E[r2]. 
Of course, if we compute 7714 [X, Y, X, Y) using the extended moment-cumulant 
formula we should get the same answer, and indeed this is the case, but it is 
important to note that the contributions to the sum come from different partitions, 
as indicated in Figure IH 
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im ^ ^ mi uu ^ ^ 

XXYY XXYY XXYY XXYY XXYY XXYY 

iiu lui UN im [11] mi 



XXYY 



Figure 3. Graphical evaluation of rmiX, X, Y, Y). 
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XYXY XYXY XYXY XYXY XYXY XYXY XYXY 

im ^ ^ mi uu ^ ^ 

XYXY XYXY XYXY XYXY XYXY XYXY 

IIU lui UN im [11] mi 



XYXY 



Figure 4. Graphical evaluation of mi{X, Y, X, Y). 

1.4. Central Limit Theorem by cumulants. We can use the theory of cu- 
mulants presented thus far to prove an elementary version of the Central Limit 
Theorem. Let Xi , X2, X^ ... be a sequence of iid random variables, and let X be a 
standard Gaussian. Suppose that the common distribution of the variables Xi has 
mean zero, variance one, and finite moments of all orders. Put 

Xi + --- + Xn 
i>N ■= — 



/N 

Then, for each positive integer n, 

lim TO„(5Ar) — mn(X). 

Since moments and cumulants mutually determine one another, in order to prove 
this CLT it suffices to prove that 
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lim C„(5Ar) = Cn{X) 

N^OQ 

for each n > 1- Now, by multilinearity of c„ and independence of the X^'s, we have 

= N'^{CniXi) + ---+Cn{XN)) 

= N^-^c„{Xi), 

where the last hne follows from the fact that the X^'s are equidistributed. Thus: if 

n — 1, 

ci(5jv) =iV^ci(Xi) =0; 

if n = 2, 

C2{Sn) = C2{Xi) = 1; 

if n > 2, 

C„(S'Ar) = Ar"'=Sativc number ^^(^^^^ 

We conclude that 

lim Cn{SN) = Sn2, 

which we have already identified as the cumulant sequence of a standard Gaussian 
random variable. 

1.5. Geometrically connected graphs. Let us now consider a variation on our 
original graph-counting question. Given a graph G on the vertex set [n], we may 
represent its vertices by n distinct points on the unit circle (say, the n*'' roots of 
unity) and its edges by straight line segments joining these points. This is how we 
represented the set of four- vertex graphs in Figure [1] We will denote this geometric 
realization of G by \G\. The geometric realization of a graph carries extra structure 
which we may wish to consider. For example, it may happen that \G\ is a connected 
set of points in the plane even if the graph G is not connected in the usual sense of 
graph theory. Let k„ denote the number of geometrically connected graphs on [n] . 
This is sequence A136653 in Sloane's database; its first few terms are 

1, 1, 4, 39, 748, 27162, 1880 872, .... 
Since geometric connectivity is a weaker condition than set-theoretic connectivity, 
Kn grows faster than c„; these sequences diverge from one another at n — 4, where 
the unique disconnected but geometrically connected graph is the "crosshairs" 
graph shown in Figure [H 

Consider now the problem of computing k„. As with c„, we can address this 
problem by means of a combinatorial decomposition of the set of graphs with n 
vertices. However, this decomposition must take into account the planar nature 
of geometric connectivity, which our previous set-theoretic decompositions do not. 
Consequently, we must formulate a new decomposition. 

Given a graph G on [n], let 7r(G) denote the partition of [n] induced by the 
connected components of G {i and j are in the same block of tt{G) if and only if 
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they are in the same connected component of G), and let 7r(|G|) denote the partition 
of [n] induced by the geometrically connected components of \G\ {i and j are in the 
same block of TrdGj) if and only if they are in the same geometrically connected 
component of \G\). How are 7r(G) and 7r(|G|) related? To understand this, let us 
view our geometric graph realizations as living in the hyperbolic plane rather than 
the Euclidean plane. Thus Figure [T] depicts line systems in the Klein model, in 
which the plane is an open disc and straight lines are chords of the boundary circle. 
We could alternatively represent a graph in the Poincare disc model, where straight 
lines are arcs of circles orthogonal to the boundary circle, or in the Poincare half- 
plane model, where space is an open-half plane and straight lines are arcs of circles 
orthogonal to the boundary line. The notion of geometric connectedness does not 
depend on the particular realization chosen. The half-plane model has the useful 
feature that the geometric realization |G| essentially coincides with the pictorial 
representation of 7r(G), and we can see clearly that crossings in |G| correspond 
exactly to crossings in tt{G). Thus, 7r(|G|) is obtained by fusing together crossing 
blocks of 7r(G). The resulting partition 7r(|G|) no longer has any crossings — by 
construction, it is a non-crossing partition, see figure [51 

We can now obtain a recurrence for k„. We construct a graph by first choosing 
a non-crossing partition of the underlying vertex set into blocks -Bi , . . . , Bk and 
then building a geometrically connected graph on each block, which can be done in 
■ ■ • K\B^,\ ways. This leads to the formula 
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mn= ^ n ^1^1' 

7reNC(n) BStt 

where the summation is over non-crossing partitions of [n]. Just as before, we can 
split off the term of the sum corresponding to the partition with only one block to 
obtain the recursion 

Kn = m„- ^ K\B\, 

7rGNC(n) SG-tt 
6(7r)>2 

in which we sum over non-crossing partitions with at least two blocks. 

1.6. Non-crossing cumulants. We have seen above that the usual set-theoretic 
notion of connectedness manifests itself probabilistically as the cumulant concept. 

We have also seen that set-theoretic connectedness has an interesting geometric 
variation, which we called geometric connectedness. This begs the question: 

Is there a probabilistic interpretation of geometric connectedness? 

Let X be a random variable, with moments m„(X). Just as the classical cumu- 
lants c„(X) were defined recursively using the relation between all structures and 
connected structures, we define the non-crossing cumulants of X recursively using 
the relation between all structures and geometrically connected structures: 

m„(X)= ^ l[tHB\{X). 

We will call this the non-crossing momcnt-cumulant formula. Since connectedness 
and geometric connectedness coincide for structures of size n = 1,2,3, the first 
three non-crossing cumulants of X are identical to its first three classical cumulants. 
However, for n > 4, the non-crossing cumulants become genuinely new statistics of 
X. 

Our first step in investigating these new statistics is to look for a non-crossing 
analogue of the most important random variable, the standard Gaussian. This 
should be a random variable whose non-crossing cumulant sequence is 

0, 1, 0, 0, .... 

If this search leads to something interesting, we may be motivated to further inves- 
tigate non-crossing probability theory. If not, we will reject the idea as a will-o'- 
the-wisp. 

From the non-crossing momcnt-cumulant formula, we find that the moments of 
the non-crossing Gaussian X are given by 

7rGNC(n) -B67r 7reNC2(n) 

That is, rn„(X) is equal to the number of partitions in NC(n) all of whose blocks 
have size 2, i.e. non-crossing pairings of n points. We know that there are no 
pairings at all on an odd number of points, so the odd moments of X must be 
zero, which indicates that X likely has a symmetric distribution. The number 
of pairings on n = 2k points is given by a factorial going down in steps of two. 
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uuu m^u uiju im^ uiuj 
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(1,-1,1,1,-1,-1) (1,1,-1,1,-1,-1) (1,1,1,-1,-1,-1) (1,1,1,-1,-1,-1) (1,1,1,-1,-1,-1) 

Figure 7. Construction of the function / from pairings to bitstrings. 

{2k — 1)!! = {2k — 1) • {2k — 3) • • • • 5 • 3 • 1, so the number of non-crossing pairings 
must be smaller than this double factorial. 

In order to count non-crossing pairings on 2k points, we construct a function 
/ from the set of all pairings on 2k points to length 2k sequences of ±l's. This 
function is easy to describe: ii i < j constitute a block of tt, then the z*^ element of 
/(tt) is -1-1 and the j*^ element of /(tt) is —1. See Figure [7] for an illustration of this 
function in the case k = 3. By construction, / is a surjection from the set of pairings 
on 2k points onto the set of length 2k sequences of ±l's all of whose partial sums 
are non-negative and whose total sum is zero. We leave it to the reader to show 
that the fibre of / over any such sequence contains exactly one non-crossing pairing, 
so that / restricts to a bijection from non-crossing pairings onto its image. The 
image sequences can be neatly enumerated using the Dvoretzky-Motzkin-Raney 
cyclic shift lemma, as in [14, §7.5]. They are counted by the Catalan numbers 

1 f2k 

Cati, = 



k + l\ k ^ 

which are smaller than the double factorials by a factor of 2*^/(^-1-1)!. This indicates 
that the distribution of X decays even more rapidly than the Gaussian distribution 
and might even be compactly supported. 
We have discovered that 



mn{X) 



0, if n odd 
Cat 21 , if n even. 



The Catalan numbers are ubiquitous in enumerative combinatorics, see [38', Exercise 
6.19] as well as [39j , and their appearance in this context is the first sign that we 
are onto something interesting. We are now faced with an inverse problem: we are 
not trying to calculate the moments of a random variable given its distribution, 
rather we know that the moment sequence of X is 

0, Cati, 0, Cat2, 0, Cats, 0, .... 

and we would like to write down its distribution fix- Equivalently, we are looking 
for an integral representation of the entire function 
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Mx{z) = Y^Catn J— ^=Y. 



n\(n + l)\ 

71=0 ^ ' n=0 ^ ' 

which has the form 

Mxiz) = y"e*>x(di), 

R 

with a probabihty measure on the real hne. The solution to this problem can 
be extracted from the classical theory of Bessel functions. 

The modified Bessel function Ia{z) of order a is one of two linearly independent 
solutions to the modified Bessel equation 

the other being the Macdonald function 

7r /„a(z) 

Ka{z) = — r . 

2 sin(Q!7r) 

The modified Bessel equation (and hence the functions Ia,Ka) appears in many 
problems of physics and engineering since it is related to solutions of Laplace's 
equation with cylindrical symmetry. An excellent reference on this topic is [TJ 
Chapter 4] . 

Interestingly, Bessel functions also occur in the combinatorics of permutations: 
a remarkable identity due to Ira Gessel asserts that 

det[Ii_,{2z)]l^^, = ^lisfc(n)— 

ji=0 ^ 

where lisfe(n) is the number of permutations in the symmetric group S(n) with no 
increasing subsequence of length fc + 1. Gessel's identity was the point of departure 
in the work of Jinho Baik, Percy Deift and Kurt Johansson who, answering a 
question posed by Stanislaw Ulam, proved that the limit distribution of the length of 
the longest increasing subsequence in a uniformly distributed random permutation 
is given by the (/3 = 2) Tracy- Widom distribution. This non-classical distribution 
was isolated and studied by Craig Tracy and Harold Widom in a series of works 
on random matrix theory in the early 1990's where it emerged as the limiting 
distribution of the top eigenvalue of large random Hermitian matrices. It has a 
density which may also be described in terms of Bessel functions, albeit indirectly. 
Consider the ordinary differential equation 

— — r-u — 2u + xu 

for a real function u — u{x), which is known as the Painleve II equation after 
the French mathematician (and two-time Prime Minister of France) Paul Painleve. 
It is known that this equation has a unique solution, called the Hastings-McLeod 
solution, with the asymptotics u{x) ~ — Ai(a;) as x — ?► oo, where 
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is a scaled specialization of the Macdonald function known as the Airy function. 
Define the Tracy- Widom distribution function by 

F{t) = e~ •/t°°(^~*'"(^)^'^^, 

where u is the Hastings-McLeod solution to Painleve II. The theorem of Baik, Deift 
and Johansson asserts that 

lim —Ais2^,+tnU<s{n) = F{t) 

for any t G R. From this one may conclude, for example, that the probability a 
permutation drawn uniformly at random from the symmetric group S(n^) avoids 
the pattern 12 ... 2n + 1 converges to F{0) = 0.9694 .... We refer the interested 
reader to Richard Stanley's survey [40] for more information on this topic. 

Nineteenth century mathematicians knew how to describe the modified Bessel 
function both as a series, 



°° ( z\2n+a 



n!r(n+ l + a)' 

n— 



and as an integral. 



Ic.{z) = r-ll^" ^ 1, / e(™^«)"(sin0)""d0. 



" 

From the series representation we find that 



and consequently we have the integral representation 



TT 

Mx(z) = ^ y e2(™^'')^sin2M0. 





This is one step removed from what we want: it tells us that the Catalan numbers 
are the even moments of the random variable X = 2cos(K), where y is a random 
variable with distribution 

HY{d9) = -sin^ede 

TT 

supported on the interval [0,7r]. However, this is a rather interesting intermediate 
step since the above measure appears in number theory, where it is called the 
Sato- Tate distribution, see Figure HI 

The Sato- Tate distribution arises in the arithmetic statistics of elliptic curves. 
The location of integer points on elliptic curves is a classical topic in number theory. 
For example, Diophantus of Alexandria wrote that the equation 

= x^ -2 

has the solution x = 3, y = 5, and in the 1650's Pierre de Fermat claimed that there 
are no other positive integer solutions. This is the striking assertion that 26 is the 
only number one greater than a perfect square and one less than a perfect cube. 
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Figure 8. The Sato- Tate density 

see Figure m That this is indeed the case was proved by Leonhard Euler in 1770, 
although according to some sources Euler's proof was incomplete and the solution 
to this problem should be attributed to Axel Thue in 1908. 

Modern number theorists study solutions to elliptic Diophantine equations by 
reducing modulo primes. Given an elliptic curve 

y'^ — + ax + b, a,b (£ Z, 

let A ~ — 16(4a^ + 276^) be sixteen times the discriminant of x^ + ax + b, and let 
Sp be the number of solutions of the congruence 

y'^ = x^ + ax + b mod p 

where p is a prime which does not divide A. In his 1924 doctoral thesis, Emil Artin 
conjectured that 

\Sp^p\<2^ 

for all such good reduction primes. This remarkable inequality states that the num- 
ber of solutions modulo p is roughly p itself, up to an error of order ^/p. Artin's 
conjecture was proved by Helmut Hasse in 1933. Around 1960, Mikio Sato and 
John Tate became interested in the finer question of the distribution of the cen- 
tred and scaled solution count (Sp —p^j^ for typical elliptic curves E (meaning 
those without complex multiplication) as p ranges over the infinitely many primes 
not dividing the discriminant of E. Because of Hasse's theorem, this amounts to 
studying the distribution of the angle Op defined by 



in the the interval [0, tt]. Define a sequence /i^ of empirical probability measures 
associated to E by 
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^ ' p<N 

where tt{N) is the number of prime numbers less than or equal to N. Sato and 
Tate conjectured that, for any elliptic curve E without complex multiplication, fi^ 
converges weakly to the Sato- Tate distribution as — oo. This is a universality 
conjecture: it posits that certain limiting behaviour is common to a large class of 
elliptic curves irrespective of their fine structural details. Major progress on the 
Sato- Tate conjecture has been made within the last decade; we refer the reader to 
the surveys of Barry Mazur [SF and Ram Murty and Kumar Murty 25J for further 
information. 

The random variable we seek is not the Sato- Tate variable Y , but twice its cosine, 
X = 2cos(y). Making the substitution s — arccos(6') in the integral representation 
of Mx{z) obtained above, we obtain 

1 

Mx{z) = ^ J e^^'Vl-s^ds, 

-1 

and further substituting t = 2s this becomes 



z 

Mx{z) = ^ y e'^V^dt. 

-2 



Thus the random variable X with even moments the Catalan numbers and vanishing 
odd moments is distributed in the interval [—2,2] with density 

ZTT 

which is both symmetric and compactly supported. This is another famous distri- 
bution: it is called the Wigner semicircle distribution after the physicist Eugene 
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Figure 10. The Wigner semicircle density 



Wigner, who considered it in the 1950's in a context ostensibly unrelated to elliptic 
curves. The density of fix is shown in Figure [10] — note that it is not a semicircle, 
but rather half an ellipse of semi- major axis two and semi-minor axis I/tt. 

Wigner was interested in constructing models for the energy levels of complex 
systems, and hit on the idea that the eigenvalues of large symmetric random matri- 
ces provide a good approximation. Wigner considered N x N symmetric matrices 
Xjv whose entries Xj^(ij) are independent random variables, up to the symmetry 
constraint Xff{ij) = Xi^{ji). Random matrices of this form are now known as 
Wigner matrices, and their study remains a topic of major interest today. Wigner 
studied the empirical spectral distribution of the eigenvalues of Xjv, i.e. the prob- 
ability measure 



which places mass \/N at each eigenvalue Xn- Note that, unlike in the setting 
above where we considered the sequence of empirical measures associated to a fixed 
elliptic curve E, the measure at is a random measure since Xn is a random matrix. 
Wigner showed that the limiting behaviour oi hn does not depend on the details of 
the random variables which make up X^- In |45| . he made the following hypotheses: 

(1) Each X^i^ij) has a symmetric distribution; 

(2) Each Xf^(ij) has finite moments of all orders, each of which is bounded by 
a constant independent of N,i,j; 

(3) The variance of X^iij) is 1/iV. 

Wigner prove that, under these hypotheses, /jn converges weakly to the semicircle 
law which now bears his name. We will see a proof of Wigner's theorem for random 
matrices with (complex) Gaussian entries in Lecture Three. The universality of the 
spectral structure of real and complex Wigner matrices holds at a much finer level, 
and under much weaker hypotheses, both at the edges of the semicircle f36! and in 
the bulk [91142]. 
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1.7. Non-crossing independence. Our quest for the non-crossing Gaussian has 
brought us into contact with interesting objects (random permutations, clhptic 
curves, random matrices) and the hmit laws which govern them (Tracy- Widom 
distribution, Sato- Tate distribution, Wigner semicircle distribution). This moti- 
vates us to continue developing the rudiments of non-crossing probability theory — 
perhaps we have hit on a framework within which these objects may be studied. 

Our next step is to introduce a notion of non-crossing independence. We know 
that classical independence is characterized by the vanishing of mixed cumulants. 
Imitating this, we will define non-crossing independence via the vanishing of mixed 
non-crossing cumulants. Like classical mixed cumulants, the non-crossing mixed 
cumulant functional are defined recursively via the multilinear extension of the 
non-crossing moment-cumulant formula, 

mn{Xi,...,Xn) = Yj Y[i^\B\{Xi:iGB). 

7reNC(n) Bev 

The recurrence 

Kn{Xi,...,Xn)=mn{Xi,...,Xn)- ^ K;|s| (Xj : i € S) 

•7reNC(n) Be-rr 

and induction establish that k„(^i, . • . , Xn) is a symmetric multilinear function of 
its arguments. Two random variables X, Y are said to be non-crossing independent 
if their mixed non-crossing cumulants vanish: 

K2{X,Y) = 

K3iX,X,Y) = K3{X,Y,Y) = 

K4{X, X, X, Y) = K4{X, X, Y, Y) = K4{X, Y, Y,Y)=0 



An almost tautological consequence of this definition is: 

X, Y non-crossing independent =^ KniX + Y)= k„(X) -|- KniY) Vn > 1. 

Thus, just as classical cumulants linearize the addition of classically independent 
random variables, 



non-crossing cumulants linearize addition of non-crossing independent random variables. 

We can also note that the semicircular random variable X, whose non-crossing 
cumulant sequence is 0, 1, 0, 0, ... , plays the role of the standard Gaussian with 
respect to this new notion of independence. For example, since non-crossing cu- 
mulants linearize non-crossing independence, the sum of two non-crossing indepen- 
dent semicircular random variables is a semicircular random variable of variance 
two. The non-crossing analogue of the Central Limit Theorem asserts that, if 
Xi,X2,... is a sequence of non-crossing independent and identically distributed 
random variables with mean zero and variance one, then the moments of 

Xi + ---+Xn 
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Figure 11. Graphical evaluation of ■m4{X, X,Y,Y) using non- 
crossing cuniulants. 

converge to the moments of the standard semicircular X as iV — > oo. The proof 
of this fact is identical to the proof of the classical Central Limit Theorem given 
above, except that classical cumulants are replaced by non-crossing cumulants. 

Of course, we don't really know what non-crossing independence means. For ex- 
ample, if X and Y are non-crossing independent, is it true that E[Xy] — E,[X]K[Y]7 
The answer is yes, since classical and non-crossing mixed cumulants agree up to 
and including order three, 

ci(X) = Ki(X), C2{X,Y) ^ K2{X,Y), C3{X,Y,Z) = K3{X,Y,Z). 

But what about higher order mixed moments? 

We observed above that, in the classical case, vanishing of mixed cumulants 
allows us to recover the familiar algebraic identities governing the expectation of 
independent random variables. We do not have a priori knowledge of the algebraic 
identities governing the expectation of non-crossing independent random variables, 
so we must discover them using the vanishing of mixed non-crossing cumulants. 
Let us see what this implies for the mixed moment m4{X, X,Y,Y) ~ E[X'^Y'^]. 
Referring to Figure [11] we see that in this case the non-crossing moment-cumulant 
formula reduces to 

m4{X,X,Y,Y) = K2{X,X)k2{Y,Y) + K2{X,X)ki{Y)ki{Y) + K2{Y,Y)ki{X)ki{X) 

+ Ati(x)Ki(x)Ki(y)Ki(r) 

which is exactly the formula we obtained for classically independent random vari- 
ablesusing the classical moment-cumulant formula. 

However, when we use the non-crossing moment-cumulant formula to evaluate the 
same mixed moment with its arguments permuted, we instead get 

m4{X,Y,X,Y) = K2iX,X)Kl{Y)Ki{Y)+K2{Y,Y)Ki{X)Ki{X)+Ki{X)Ki{X)KliY)Kl 

see Figure 
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Figure 12. Graphical evaluation of ■m4{X,Y,X,Y) using non- 
crossing cuniulants. 



Since m4{X, X,Y,Y) — ■m4{X,Y, X,Y), we are forced to conclude that the two 
expressions obtained are equal, which in turn forces 

K2{X,X)k2{Y,Y)^0. 

Thus, if X, Y are non-crossing independent random variables, at least one of them 
must have vanishing variance, and consequently must be almost surely constant. 
The converse is also true — one can show that a (classical or non-crossing) mixed 
cumulant vanishes if any of its entries are constant random variables. So we 
have classified pairs of non-crossing independent random variables: they look like 
{X, = {arbitrary, constant}. Such pairs of random variables are of no interest 
from a probabilistic perspective. It would seem that non-crossing probability is a 
dead end. 

1.8. The medium is the message. If is a compact Hausdorff space then the 
algebra A{i^) of continuous functions : C is a commutative C*-algebra. 

This means that in addition to its standard algebraic structure (pointwise addi- 
tion, multiplication and scalar multiplication of functions) A{Q) is equipped with 
a norm satisfying the Banach algebra axioms and an antilinear involution which is 
compatible with the norm, = The norm comes from the topology 

of the source, ||X|| = sup^ |X(a;)|, and the involution comes from the conjugation 
automorphism of the target, X*{u!) — X{uj). Conversely, a famous theorem of 
Israel Gelfand asserts that any unital commutative C* -algebra A can be realized 
as the algebra of continuous functions on a compact Hausdorff space ^{A) in an 
essentially unique way. In fact, n{A) may be constructed as the set of maximal 
ideals of A equipped with a suitable topology. The associations f2 i-> A{i}) and 
A I— > fl{A) are contravariantly functorial and set up a dual equivalence between 
the category of compact Hausdorff spaces and the category of unital commutative 
C*-algebras. 

There are many situations in which one encounters a category of spaces dually 
equivalent to a category of algebras. In a wonderful book |26| . the mathematicians 
collectively known as Jet Nestruev develop the theory of smooth real manifolds 
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entirely upside-down: the theory is buih in the dual algebraic category, whose ob- 
jects Nestruev terms smooth complete geometric R-algebras, and then exported to 
the geometric one by a contravariant functor. In many situations, given a category 
of spaces dually equivalent to a category of algebras it pays to shift our stance 
and view the the algebraic category as primary. In particular, the algebraic point 
of view is typically easier to generalize. This is the paradigm shift driving Alain 
Connes' non-commutative geometry programme, and the reader is referred to [7] 
for much more information. 

This paradigm shift is precisely what is needed in order to salvage non-crossing 
probability theory. In probability theory, the notion of space is that of a Kolmogorov 
triple (r2, J^, P) which models the probability to observe a stochastic system in a 
given state or collection of states. The dual algebraic object associated to a Kol- 
mogorov triple is L°°(r2, P), the algebra of essentially bounded complex random 
variables X : 51 — >■ C. Just like in the case of continuous functions on a compact 
HausdorfF space, this algebra has a very special structure: it is a commutative von 
Neumann algebra equipped with a unital faithful tracial state, t[X] = J^XdP. 
Moreover, there is an analogue of Gelfand's theorem in this setting which says that 
any commutative von Neumann algebra can be realized as the algebra of bounded 
complex random variables on a Kolmogorov triple in an essentially unique way. 
This is the statement that the categories of Kolmogorov triples and commutative 
von Neumann algebras are dual equivalent. 

Non-crossing independence was rendered trivial by the commutativity of random 
variables. We can rescue it from the abyss by following the lead of non-commutative 
geometry and dropping commutativity in the dual category: we shift our stance 
and define a non-commutative probability space to be a pair (A, t) consisting of 
a possibly non-commutative complex associative unital algebra A together with a 
unital linear functional t : A ^ C If we reinstate commutativity and insist that A 
is a von Neumann algebra and r a faithful tracial state, we are looking at essentially 
bounded random variables on a Kolmogorov triple, but a general non-commutative 
probability space need not be an avatar of any classical probabilistic entity. 

As a nod to the origins of this definition, and in order to foster analogies with 
classical probability, we refer to the elements of A as random variables and call 
T the expectation functional. This prompts some natural questions. Before this 
subsection we only discussed real random variables — complex numbers crept in 
with the abstract nonsense. What is the analogue of the notion of real random 
variable in a non-commutative probability space? Probabilists characterize random 
variables in terms of their distributions. Can we assign distributions to random 
variables living in a non-commutative probability space? Is it possible to give 
meaning to the phrase "the distribution of a bounded real random variable living in 
a non-commutative probability space is a compactly supported probability measure 
on the line"? We will deal with some of these questions at the end of Lecture Two. 
For now, however, we remain in the purely algebraic framework, where the closest 
thing to the distribution of a random variable X € A is its moment sequence 
m„(X) = t[X"]. As in gH Page 12], 

The algebraic context is not used in the pursuit of generality, but 
rather of transparence. 

1.9. A brief history of the free world. Having cast off the yoke of commuta- 
tivity, we are free — free to explore non-crossing probability in the new framework 
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provided by the non-commutative probability space concept. Non-crossing proba- 
bility has become Free Probability, and will henceforth be referred to as such. 
Accordingly, non-crossing cumulants will now be referred to as free cumulants, and 
non-crossing independence will be termed free independence. 

The reader is likely aware that free probability is a flourishing area of contem- 
porary mathematics. This first lecture has been historical fiction, and is essentially 
an extended version of |27j . Free probability was not discovered in the context of 
graph enumeration problems, or by tampering with the cumulant concept, although 
in retrospect it might have been. Rather, free probability theory was invented by 
Dan- Virgil Voiculescu in the 1980's in order to address a famous open problem in 
the theory of von Neumann algebras, the free group factors isomorphism problem. 
The problem is to determine when the von Neumann algebra of the free group on 
a generators is isomorphic to the von Neumann algebra of the free group on b gen- 
erators. It is generally believed that these are isomorphic von Neumann algebras if 
and only if a = 6, but this remains an open problem. Free probability theory (and 
its name) originated in this operator-algebraic context. 

Voiculescu's definition of free independence, which was modelled on the free 
product of groups, is the following: random variables X,Y in a non-commutative 
probability space (A, t) are said to be freely independent if 

T[hiX)gi{Y)...fk{X)gk{Y)]^0 
whenever fi, gi, . . . , fk, gk sue polynomials such that 

r[/i(X)] - T[g,iX)] = ■■■= t[MX)] = T[.g,(r)] = 0. 

This should be compared with the definition of classical independence: random vari- 
ables X,Y in a non-commutative probability space {A, r) are said to be classically 
independent if they commute, XY = YX, and if 

T[f{X)giY)] = 

whenever / and g are polynomials such that r[/(X)] = T[(7(y)] = 0. These two 
definitions are antithetical: classical independence has commutativity built into 
it, while free independence becomes trivial if commutativity is imposed. Never- 
theless, both notions are accommodated within the non-commutative probability 
space framework. 

The precise statement of equivalence between classical independence and van- 
ishing of mixed cumulants is due to Gian-Carlo Rota [31]. In the 1990's, knowing 
both of Voiculescu's new free probability Theory and Rota's approach to classical 
probability theory, Roland Speicher made the beautiful discovery that by excising 
the lattice of set partitions from Rota's foundations and replacing it with the lat- 
tice of non-crossing partitions, much of Voiculescu's theory could be recovered and 
extended by elementary combinatorial methods. In particular, Speicher showed 
that free independence is equivalent to the vanishing of mixed free cumulants. The 
combinatorial approach to free probability is exhaustively applied in [28] , while the 
original analytic approach of Voiculescu is detailed in [44] . 
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2. Lecture Two: Exploring the Free World 

Lecture One culminated in the notion of a non-commutative probability space 
and the realization that this framework supports two types of independence: clas- 
sical independence and free independence. From here we can proceed in several 
ways. One option is to prove an abstract result essentially stating that these are 
the only notions of independence which can occur. This result, due to Speicher, 
places classical and free independence on equal footing. Another possibility is to 
present concrete problems of intrinsic interest where free independence naturally 
appears. We will pursue the second route, and examine problems emerging from the 
theory of random walks on groups which can be recast as questions about free ran- 
dom variables. In the course of solving these problems we will develop the calculus 
of free random variables and explore the terrain of the free world. 

2.1. Random walk on the integers. The prototypical example of a random walk 

on a group is the simple random walk on Z: a walker initially positioned at zero 
tosses a fair coin at each tick of the clock — if it lands heads he takes a step of 
+1, if it lands tails he takes a step of —1. A random walk is said to be recurrent 
if it retin-ns to its initial position with probability one, and transient if not. Is the 
simple random walk on Z recurrent or transient? 

Let a{n) denote the number of walks which return to zero for the first time after 
n steps, and let (f)(n) = 2~"a{n) denote the corresponding probability that the first 
return occurs at time n. Note that a(0) = 0(0) = 0, and define 



F(z) = ^</.(n)z". 

Then 



OO 

F{l) = J2Hn)<l 

is the probability we seek. The radius of convergence of F{z) is at least one, and 
by Abel's theorem 



F(l) = lim F(x) 

as X approaches 1 in the interval [0, 1). 

Let A(n) denote the number of length n loops on Z based at 0, and let p{ti) = 
2~"A(n) be the corresponding probability of return at time n (regardless of whether 
this is the first return or not). Note that A(0) = p(0) = 1. We have 



0, if n odd 
f"), if n even 



A(n) = j 

Prom Stirling's formula, we see that 

p{2k) ~ ^ 
as fc — >■ oo. Thus the radius of convergence of 
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i?(z) = ^p(n)z" 

ri=0 

is one. 

We can decompose the set of loops of given length according to the number of 
steps taken to the first return. This produces the equation 

n 

\{n) = ^a{k)\(n - k). 
Equivalently, since all probabilities are uniform, 

n 

P{n) ^^(t){k)p{n- k). 
Summing on z, this becomes the identity 

R{z) - 1 = F{z)R{z) 

in the algebra of holomorphic functions on the open unit disc in C. Since R{z) has 
non-negative coefficients, it is non- vanishing for x G [0, 1) and wc can write 

= 0<x<f. 
R{x) 

Thus 

F{1) = lim F{x) = 1 ^ 



>i lim^j^i R{x) 

If < oo, then by Abel's theorem lim^^i R{x) = R{1) and we obtain F{1) < 1. 
On the other hand, if i?(l) = oo, then lim^:-;.! i?(a;) — oo and we get F{1) = 1. 
Thus the simple random walk is transient or recurrent according to the convergence 
or divergence of the series '^p{n). From the Stirling estimate above we find that 
this sum diverges, so the simple random walk on Z is recurrent. 

2.2. Polya's theorem. In the category of abelian groups, coproduct is direct sum: 

iei iei 

In 1921, George Polya [30] proved that the simple random walk on 



d 

is recurrent for d = 1,2 and transient for d > 2. This striking result can be deduced 
solely from an understanding of the simple random walk on Z. 

Let us give a proof of Polya's theorem. Let Xdin) denote the number of length 
n loops on Z"^ based at O''. Let pd{n) denote the probability of return to 0'' after n 
steps, 
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As above, the simple random walk on is recurrent if the sum ^ Pd{n) diverges, 
and transient otherwise. Form the loop generating function 



L<i(z) = ^ Ad(n)z". 



n=0 



We aim to prove that 



^4^)= Em-) 

^ ^ n=0 

diverges for d = 1, 2 and converges for d > 2. 

While the ordinary loop generating function is hard to analyze directly, the 
exponential loop generating function 



is quite accessible. Indeed, as in the last subsection we have 



Ai(n) 

so that 



0, if n odd 
(2) , if n ev 



is precisely the modified Bessel function of order zero. Since a loop on Z'' is just a 
shuffle of loops on Z, the product formula for exponential generating functions |i38] 
yields 

What we have is the exponential generating function for the loop counts Xdin), 
and what we want is the ordinary generating function of this sequence. The integral 
transform 



Lf{z) = / f{tz)e-'dt, 







which looks like the Laplace transform of / but with the z-parameter in the wrong 
place, converts exponential generating functions into ordinary generating functions. 
This can be seen by differentiating under the integral sign and using the fact that 
the moments of the exponential distribution are the factorials. 







This trick is constantly used in quantum field theory in connection with Borel 
summation of divergent series [10]. In particular, we have 
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Ld{z) = J Ed{tz)e-*dt = j Io{2tz fe-*dt. 





Thus it remains only to show that the integral 



is divergent for d = 1,2 and convergent for d > 2. This in turn amounts to 
understanding the asymptotics of Io{t/d) as t ^ oo along the real line. 

We already encountered Bessel functions in Lecture One, and we know that 

TT 

Io{t/d) = I j e^'^'^de. 



This is an integral of Laplace type, 

6 

j e'f^o^dO, 

a 

and Laplace integrals localize as f — >■ oo with asymptotics given by the classical 
steepest descent formula (maximum at an endpoint case), 

6 



/ 



''2t\r{a)f ■ 



For our integral, this specializes to 



from which it follows that Lj^{{2d) ^) diverges or converges according to the diver- 
gence or convergence of the integral 



oo 

/ 



1 

This integral diverges for d = 1,2 and converges for d > 3, which proves Polya's 
result. In fact, the probability that the simple random walk on returns to its 
initial position is already less than thirty five percent. 

2.3. Kesten's problem. The category of abelian groups is a full subcategory of 
the category of groups. In the category of groups, coproduct is free product: 

i<£l 

Thus one could equally well ask about the recurrence or transience of the simple 
random walk on 
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Figure 13. Balls of increasing radius in F2. 



d 

the free group on d generators. Whereas the Cayley graph of the abelian group 
Z'^ is the (2d)-regular hypercubic lattice, the Cayley graph of the free group Fd is 
the (2d)-regular tree, see Figure [131 What is the free analogue of Polya's theorem? 
We will see that the random walk on can be understood entirely in terms of 
the random walk on Fi = Z, just like in the abelian category. However, the tools 
we will use are quite different, and the concept of free random variables plays the 
central role. 

The study of random walks on groups was initiated by Harry Kesten in his 1958 
Ph.D. thesis, with published results appearing in [3D]. A good source of information 
on this topic, with many pointers to the literature, is Laurent Saloff-Coste's survey 
article [33j . Kesten related the behaviour of the simple random walk on a finitely- 
generated group G to other properties of G, such as amenability. A countable 
group is said to be amenable if it admits a finitely additive G-invariant probability 
measure. The notion of amenability was introduced by John von Neumann in 
1929. Finite groups are amenable since they can be equipped with the uniform 
measure P{g) — |G|~^. For infinite groups the situation is not so clear, and many 
different characterizations of amenability have been derived. For example, Alain 
Connes showed that a group is amenable if and only if its von Neumann algebra is 
hyperfinite. Kesten proved that G is non-amenable if and only if the probability 
pcin) that the simple random walk on G returns to its starting point at time n 
decays exponentially in n. We saw above that for G = Z the return probability 
has square root decay, so Z is amenable. In fact, amenability is preserved by direct 
sum so all abelian groups are amenable. Is the free group F^ amenable? Let Xd(n) 
denote the number of length n loops on F^ based at id. We will refer to the problem 
of finding an explicit expression for the loop generating function 



00 

Ld{z) = 1 + 51 ^<i{n)z 

n=l 
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as Kcstcn's problem. Presumably, if we can obtain an explicit expression for this 
function then we can read off the asymptotics of pd{n), which is the coefficient of 2" 
in Ld{z/2d), via the usual methods of singularity analysis of generating functions. 

We begin at the beginning: d = 2. Let A and B denote the generators of F2, 
and let A = A[F2] be the group algebra consisting of formal C-lincar combinations 
of words in these generators and their inverses, A~^ and B~^. The identity element 
of A is the empty word, which is identified with id € F2 . Introduce the expectation 
functional 



t[X] = coefficient of id in X 

for each X G A. Then {A,t) is a non-commutative probability space. A loop 
id — > id in F2 is simply a word in A, A~^ , B, which reduces to id. Thus the 
number of length n loops in F2 is 

A2(n) = m„(X + Y) = t[{X + ¥)% 
where X,Y G A are the random variables 



x = a + a-\ y = b + b-\ 

We see that the loop generating function for F2 is precisely the moment generating 
function for the random variable X + Y in the non-commutative probability space 
{A,t), 

00 

L2{z) = l + ^m„(X + r)z". 

n=l 

We want to compute the moments of the sum X + Y oi two non-commutative 
random variables, and what we know are the moments of its summands: 



mn{X) = m„(F) = 



0, if n odd 
, if n even 



Now we make the key observation: the random variables X, Y are freely indepen- 
dent. Indeed, suppose that fi,gi, . . . , fk,gk are polynomials such that 



r[/i(X)] = T[gi(r)] = TlUX)] = r[gk{Y)] = 0. 

This means that fi{X) = fi{A + A~^) is a Laurent polynomial in A with zero 
constant term, and gj{Y) = gj{B + B~^) is a Laurent polynomial in B with zero 
constant term. Since there arc no relations between A and _B, an alternating product 
of polynomials of this form cannot produce any occurrences of the empty word, and 
we have 



T[h{X)g,{Y)...MX)gk{Y)]=0. 

This is precisely Voiculescu's definition of free independence. 

We conclude that the problem of computing A2(n) is a particular case of the 
problem of computing the moments m„(X -|- Y) of the sum of two free random 
variables given their individual moments, m„(X) and m„(F). This motivates us 
to solve a fundamental problem in free probability theory: 
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Given a pair of free random variables X and Y, compute the moments of X + Y 
in terms of the moments of X and the moments of Y. 
Wc can, in principle, solve this problem using the fact that free cumulants lin- 
earize the addition of free random variables, + Y) = + This 
solution is implemented as the following recursive algorithm. 
Input: Ki{X), Kn-i{X), Ki{Y), «;„_i(F). 

Step 1: Compute m„(X),m„(y). 
Step 2: Compute K„(X),«;„(y) using 

Kn{X)=mn{X)- H'^l/SlW 

7reNC(n,) BEtt 
f)(7r)>2 

Kn{Y)=mn{Y)- U'^win 

7reNC(ra) BStt 
6(7r)>2 

Step 3: Add, 

{X + Y) = Kn{X) + Kn{Y). 

Step 4- Compute m„(X + Y) using 

mniX + Y)=Kn{X + Y)+ Y I[^^\B\iX + Y). 

7reNC(n) Be-IT 
6(7r>2 

Output: mn{X + Y). 
This recursive algorithm is conceptually simple but virtually useless as is. In 
particular, it is not clear how to coax it into computing the loop generating function 
L2{z). We need to develop an additive calculus of free random variables which 
parallels the additive calculus of classically independent random variables. 

2.4. The classical algorithm. If X, Y arc classically independent random vari- 
ables, we can compute the moments of their sum X + Y using the recursive al- 
gorithm above, replacing free cumulants with classical cumulants. But this is not 
what probabilists do in their daily lives. They have a much better algorithm which 
uses analytic function theory to efficiently handle the recursive nature of the naive 
algorithm. The classical algorithm associates to X and Y analytic functions Mx{z) 
and My{z) which have the property that Mx+y{z) := Mx{z)My{z) encodes the 
moments of X + Y as its derivatives at z = 0. We will give a somewhat roundabout 
derivation of this algorithm, which is presented in this way specifically to highlight 
the analogy with Voiculcscu's algorithm presented in the next section. 

The classical algorithm for summing two random variables is developed in two 
stages. In the first stage, the relation between the moments and classical cumulants 
of a random variable is packaged as an identity in the ring of formal power series 
C[[2;]]. Suppose that {m„)^^i and (c„)5^i are two numerical sequences related by 
the chain of identities 
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ireP(n) Be-iT 

The TT*^ term of the sum on the right only depends on the "spectrum" of it, i.e. 
the integer vector A(7r) = {l''^^'^\2''^'^''\ . . . ,71''"^'^^), where 6^(77) is the number 
of blocks of size i in tt. We may view A(7r) as the Young diagram with bi rows of 
length i. Consequently, we can perform a change of variables to push the summation 
forward onto a sum over Young diagrams with n boxes provided we can compute 
the "Jacobian" of the map A : P(n) — > Y(n) sending tt on its spectrum: 

E c^c^^..cMA-l(l''^2''^...,n^")l■ 

hi+262H hnb„=n 

The volume of the fibre of A over any given Young diagram can be explicitly com- 
puted, 

' ^ ' (l!)&i(2!)''2...(n!)f"6i!62!...6„!' 
so that we have the chain of identities 

n\ ^ bilb2l...bj. ' 

6i+2b2H hri6„=ri 

We can bundle these identities as a single relation between power series. Summing 
on z we obtain 



1 - 1 V, ^ ^ . bl\b2\...bj. 

n=l n=l ^bi+2b2-\ t-nbn=n 



1 / 00 n \ 2 



1 / z" \ 1 / z" 



We conclude that the chain of moment-cumulant formulas is equivalent to the single 
identity M{z) = e^^'''> in C[[z]], where 



n— 1 n— 1 



M(z) = l + E"^n — , C(Z) = E' 



This fact is known in enumerative combinatorics as the exponential formula. In 
other branches of science it goes by other names, such as the the polymer expansion 
formula or the linked cluster theorem. In the physics literature, the exponential 
formula is often invoked using colourful phrases such as "connected vacuum bubbles 
exponentiate" [33] . The exponential formula seems to have been first written down 
precisely by Adolf Hurwitz in 1891 |19) . 

The exponential formula becomes particularly powerful when combined with 
complex analysis. Suppose that X, Y are classically independent random variables 
living in a non-commutative probability space {A, r) . Suppose moreover that an 
oracle has given us probability measures , ^-y on the real line which behave like 
distributions for X, Y insofar as 
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n> 1. 



Let us ask for even more, and insist that jiXifJ-Y are compactly supported. Then 
the function^ 

Mx{z) = Je'^fixidt), My{z)^ J e*>y(dO 

R R 

are entire, and their derivatives can be computed by differentiation under the inte- 
gral sign. Consequently, we have the globally convergent power series expansions 

^ n ^ n 

Mx{z) = 1 + V m„(X)^, My{z) = 1 + V m„(r)^. 

n— 1 n— 1 

Since Mx (0) ~ My (0) = 1 and the zeros of holomorphic functions are discrete, 
we can restrict to a complex domain D containing the origin on which Mx (z), My {z) 
are non- vanishing. Let Hol(D) denote the algebra of holomorphic functions on D. 
The following algorithm produces a function Mx+y{z) G Ho1(D) whose derivatives 
at z = are the moments of X + Y . 

Input: ^ix and /iy. 
Step 1 : Compute 

Mx{z) = j e''^ix{dt), My{z) = j e*>y(di). 

R R 

Step 2: Solve 

Mx{z) = e'^^W, My(z) = e'^'^^^) 
in Hol(D) subject to Cx(0) = Cy(0) = 0. 
Step 3: Add, 



Cx+y{z) ■.^Cx{z) + Cy{z). 
Step 4 '• Exponentiate, 

Mx+y{z) :=e^-+-(^). 



Output: Mx+y{z). 

In Step One, we try to compute the integral transforms Mx{z), My{z) in terms of 
elementary functions, like e^, log(z), sin(z), cos(z), sinh(z), cosh(z), . . . etc, or other 
classical functions like Bessel functions, Whittaker functions, or anything else that 
can be looked up in PJ . This is often feasible if the distributions iix , I^-y have known 
densities, and we saw some examples in Lecture One. 



The restriction of Mx to the real axis, Mxi—x), is the two-sided Laplace transform, while 
the restriction of Mx to the imaginary axis, Mx(—iy), is the Fourier transform. 
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The equations in Step Two have unique solutions. The required functions 
Cx{z), Cy{z) € Hol(D) are the principal branches of the logarithms of Mx{z), My{z) 
on D, and can be represented as contour integrals: 

Cxiz) = logM.(z) = £ ^dC, Criz) = logMv-(.) = £ ^dC 

for 2; e D. Since log has the usual formal properties associated with the logarithm, 
if Step One outputs a reasonably explicit expression then so will Step Two. 

Step Two is the crux of the algorithm. It is performed precdsely to change 
gears from a moment computation to a cumulant computation. Appealing to the 
exponential formula, we conclude that the holomorphic functions Cx{z),Cy{z) 
passed to Step Three by Step Two have Maclaurin series 

Cx{z) = Y,Cn{X)^, Cy(z) = ^C„(y)-, 
n=l * n=l 

where c„(X),c„(y) arc the cumulants of X and Y . Since cumulants linearize the 
addition of independent random variables, the new function Cx+y{z) := Cx{z) + 
Cy{z) defined in Step Three encodes the cumulants of X + y as its derivatives at 
z = Q. 

In Step Four we define a new function Mx+y{z) € Hol(D) by Mx+y{z) ■= 
qCx+y (z) 'j'jjg exponential formula and the moment-cumulant formula now combine 
in the reverse direction to tell us that the Maclaurin series of Mx+y{z) is 

00 „ 
Mx+y{z) = 1 + V m„(X + Y)^. 

In summary, assuming that X, Y are classically independent random variables 
living in a non-commutative probability space {A, r) with affiliated distributions 
HxilJ-Y having nice properties, the classical algorithm takes these distributions 
as input and outputs a function Mx+y{z) analytic at z = whose derivatives 
are the moments oi X -\-Y . It works by combining the exponential formula and 
the moment-cumulant formula to convert the moment problem into the (linear) 
cumulant problem, adding, and then converting back to moments. An optional 
Fifth Step is to extract the distribution hx+y from Mx+y{z) using the Fourier 
inversion formula: 

T 

Mx+r([a,6])= lim — / Mx+Y(it)dt. 

-T 

2.5. Voiculescu's algorithm. We wish to develop a free analogue of the classical 
algorithm. Suppose that X, Y are freely independent random variables living in 
a non-commutative probability space {A,t) possessing compactly supported real 
distributions fix,I^Y- The free algorithm should take these distributions as input, 
build a pair of analytic functions which encode the moments of X and Y respec- 
tively, and then convolve these somehow to produce a new analytic function which 
encodes the moments oi X + Y. A basic hurdle to be overcome is that, even assum- 
ing we know how to construct fix and /xr, we don't know what to do with them. 
We could repeat Step One of the classical algorithm to obtain analytic functions 
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Mx{z), My{z) whose derivatives at z = arc the moments of X and Y. If we then 
perform Step Two we obtain analytic functions Cx{z), Cy{z) whose derivatives en- 
code the classical cumulants of X and Y. But classical cumulants do not linearize 
addition of free random variables. 

The classical algorithm is predicated on the existence of a formal power series 
identity equivalent to the chain of classical moment- cumulant identities. We need 
a free analogue of this, namely a power series identity equivalent to the chain of 
numerical identities 

= n '^1^1' " ^ 1- 

7reNC(n) SeTT 

Proceeding as in the classical case, rewrite this in the form 

m„= «'/4^...4"|A-^(l'\2''%...,n^")nNC(n)|, 

61+262 H \-nbn=n 

where as above A : P(n) — Y(n) is the surjection which sends a partition tt with 
bi blocks of size i to the Young diagram with 6j rows of length i. Now we have to 
compute the volume of the fibres of A intersected with the non-crossing partition 
lattice. The solution to this enumeration problem is again known in explicit form, 

l^-i^^mi m„)nNC(n)| = ; — ■ TTMrirn n- 

This formula allows us to obtain the desired power series identity, though the ma- 
nipulations required are quite involved and require either the use of Lagrange in- 
version or an understanding of the poset structure of NC(n). In any event, what 
ultimately comes out of the computation is the fact that two numerical sequences 
satisfy the chain of free moment-cumulant identities if and only if the ordinary (not 
exponential) generating functions 

C30 00 

L{z) = l + Y. -^('^) = 1 + 

n=l n=l 

solve the equation 

L{z) = K{zL{z)) 

in the formal power series ring C[[^;]]. This is the free analogue of the exponential 
formula. 

As in the classical case, we wish to turn this formal power series encoding into 

an analytic encoding. Suppose that X,Y admit distributions iixtI^y supported 
in the real interval [— r, r]. We then have |m„(X)|, |m„(F)| < r", so the moment 
generating functions 

00 00 
Lx{z) = l + J2 mn{X)z^, Ly{z) = l + J2 ^n{Y)z^, 

n—1 n—1 

are absolutely convergent in the open disc D(0, i). One can use the relation between 
moments and free cumulants to show that the free cumulant generating functions 
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Kx{z) = 1 + ^ «n(^)^", Ky{z) = 1 + ^ fin{y)z 



,n 



n=l n=l 



arc also absolutely convergent on a (possibly smaller) neighbourhood of z = 0. 
However, it turns out that the correct environment for the free algorithm is a 
neighbourhood of infinity rather than a neighbourhood of zero. This is because 
what wc really want is an integral transform which realizes ordinary generating 
functions in the same way as the Fourier (or Laplace) transform realizes exponential 
generating functions. Access to such a transform will allow us to obtain closed forms 
for generating functions by evaluating integrals, just like in classical probability. 
Such an object is well-known in analysis, where it goes by the name of the Cauchy 
(or Stieltjes) transform. The Cauchy transform of a random variable X with real 
distribution /ix is 



The Cauchy transform is wcU-dcfincd on the complement of the support of /ix, 
and differentiating under the integral sign shows that Gx{z) is holomorphic on its 
domain of definition. In particular, if fxx is supported in [— r, r] then Gx (z) admits 
the convergent Laurent expansion 



on \z\ > r. This is an ordinary generating function for the moments of X with z~^ 
playing the role of the formal variable. 

To create an interface between the free moment-cumulant formula and the Cauchy 
transform, we must re-write the formal power series identity L{z) = K{zL{z)) as 
an identity in C((z)) = QuotC[[z]], the field of formal Laurent series. Introduce 
the formal Laurent series 



The automorphism z i— > i transforms the non-crossing exponential formula into 
the identity 






K{G{z)) 
G{z) 



= z. 



Setting 




this becomes the identity 



in C{{z)). 



V{G{z))=z 
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We have now associated two analytic functions to X. The first is the Cauchy 
transform Gx {z), which is defined as an integral transform and admits a convergent 
Laurent expansion in a neighbourhood of infinity in the z-plane. The second is the 
Voiculescu transform Vx{w), which is defined by the convergent Laurent series 

^ oo 
Vx{w) = - + 

in a neighbourhood of zero in the w-plane. The Voiculescu transform is a meromor- 
phic function with a simple pole of residue one at w = 0. The Voiculescu transform 
less its principal part, Rx{w) — Vx{w) — — , is an analytic function known as the 
i?-transform of X. From the formal identities V{G{z)) — z, G{V{w)) — w and 
the asymptotics Gx{z) ~ ^ as |z| — oo and Vx{w) ^ ^ as \w\ — > 0, we expect 
to find a neighbourhood Dqo of infinity in the 2;-plane and a neighbourhood Do of 
zero in the lu-plane such that Gx ■ ^oo — ^ Dq and Vx ■ Dq — > Dqo are mutually 
inverse holomorphic bijections. The existence of the required domains hinges on 
identifying regions where the Cauchy and Voiculescu transforms are injective, and 
this can be established through a complex-analytic argument, see p4l Chapter 4]. 

With these pieces in place, we can state Voiculescu's algorithm for the addition 
of free random variables. 

Input: ^x and ^ly- 
Step 1 : Compute 

Gx{z) = j ^/ix(di), Gy{z) = J ^^Mdt) 

R R 

Step 2: Solve the first Voiculescu functional equations, 

{Gx o Vx){w) — W, {Gy o Vy){w) — W 

subject to Vx{w) ^ ^ near w — 0. 
Step 3: Remove principal part, 

Rx{w) ^Vx{w) - —, Ry{w) ^Vy{w) ~ 

w w 

add, 

Rx+y{w) Rx{w) + Ry{w), 
restore principal part, 

Vx+y{w) := Rx+y{w) + —. 

w 

Step 4 '• Solve the second Voiculescu functional equation, 

{Vx+y o Gx+y){z) = z, 
subject to Gx+y{z) ~ ^ near z = oo. 



Output: Gx+y{z). 
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Voiculcscu's algorithm is directly analogous to the classical algorithm presented 
in 11k> pr(>\i()us section. The analogy can lie succinctly suinniariz(>(I as follow s: 



The R-transform is the free analogue of the logarithm of the Fourier transform. 



In Step One, we try to compute the integral transforms Gx{z),Gy{z) in terms 
of elementary functions. 

Step Two changes gears from a moment computation to a cumulant computa- 
tion. Since free cumulants linearize the addition of free random variables, the new 
function Vx+y{w) '■= Rx{w) + Ry{w) + ^ defined in Step Three encodes the free 
cumulants of Kn{X + Y) as its Laurent coefficients of non- negative degree. 

In Step Four we define a new function Gx+y{z) by solving the second Voiculcscu 
functional equation. The free exponential formula and the free moment-cumulant 
formula combine in the reverse direction to tell us that the Laurent series of 
Gx+y{z) is 

Gx+y{z) - 2^ . 

An optional Fifth Step is to extract the distribution Hx+v from Gx+y{z) using 
the Stieltjes inversion formula: 

l^x+Y{dt) = - - lim '^Gx+Y{t + ie). 

TT s—^0 

2.6. Solution of Kesten's problem. Our motivation for building up the additive 
theory of free random variables came from Kesten's problem: explicitly determine 

the loop generating function of the free group F2, and more generally of the free 
group Frf, d>2. This amounts to computing the moment generating function 

C30 

La{z) = l + Y.'^n{Sd)z'^ 

n=l 

of the sum 

Sd = Xi -\ + Xd 

of fid (free identically distributed) random variables with moments 



r 



0, n odd 



n even. 



Voiculcscu's algorithm gives us the means to obtain this generating function pro- 
vided we can feed it the required input, namely a compactly supported probability 
measure on M with moment sequence 



As we saw above, the exponential generating function of this moment sequence, 

00 2h 

M(') = 12h=^o{2z), 
fe=o 
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Figure 14. The arcsine density 

coincides with the modified Bessel function of order zero. From the integral repre- 
sentation 



/o(2z) = i y e2(^°^'')M0 



we conclude that a random variable X with odd moments zero and even moments 
the central binomial coefficients is given by X = 2cos(F), where Y has uniform 
distribution over [0, tt]. Making the same change of variables that we did in Lecture 
One, we obtain 



so that i^x is supported on [—2, 2] with density 

1 



-At. 



This measure is known as the arcsine distribution because its cumulative distribu- 
tion function is 



r 1 



1 arcsine (I) 



So to obtain the loop generating function -^2(2) for the simple random walk on F2, 
we should run Voiculescu's algorithm with input ^x — ^J■Y — arcsine. 

Let us warm up with an easier computation. Suppose that X, Y are not fid 
arcsine random variables, but rather fid ±l-Bernoulli random variables: 
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We will use Voiculescu's algorithm to obtain the distribution of X + Y. U X,Y 
were classically iid Bernoullis, we would of course obtain the binomial distribution 

IJ'X+Y = ^0-2 + 2*^0 + -^0+2 

giving the distribution of the simple random walk on Z at time two. The result is 
quite different in the free case. 

Step One. Obtain the Cauchy transform, 

1/1 1 \ °^ 1 

Step Two. Solve the first Voiculescu functional equation. Prom Step One, this 

is 

wV'^iw) - V{w) -w = 0, 

which has roots 



1 + Vl + 4^2 1 3 „ 5 , 7 1 - Vl + 3 ^ 5 

2w w 2w 

We identify the first of these as the Voiculescu transform Vxiw) = Vy{w). 
Step Three. Compute the i?- transform. 



Rx{w) = Ry{w) = = , 

2w w 2w 

and sum to obtain 



Rx+y{w) = Rx{w) + Ry{w) 
Now restore the principal part. 



VI + 4u;2 - 1 



w 



Vx+y{w) = Rx+y[w) + — = . 

w w 

Step Four. Solve the second Voiculescu functional equation. From Step Three, 
this is the equation 

yi + 4G{zy _ 

G{z) 

which has roots 

±1 ±1 . ±2 ±6 . ±20 ±70 ±252 



V?^^ z z"^ ^ 

The positive root is identified as Gx+y{z)- 

Finally, we perform the optional fifth step to recover the distribution fix+Y whose 
Cauchy transform is Gx+y{z)- This can be done in two ways. First, we could notice 
that the non-zero Laurent coefficients of Gx+y are the central binomial coefficients 
(j^), and we just determined that these are the moments of the arcsine distribution. 
Alternatively we could use Stieltjes inversion: 
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A^x+y(d<) = — — lini — = 3 , = = — Sm^o- 

We conclude that the sum of two fid BernouUi random variables has arcsine 
distribution. Note the surprising feature that the outcome of a free coin toss has 
continuous distribution over [—2, 2]. More generally, we can say that the sum 

Sd — Xi + ■ ■ ■ + X2d 

of 2d fid ±l-Bernoulli random variables, i.e. the sum of 2d free coin tosses, encodes 
all information about the simple random walk on in its moments. 

Let us move on to the solution of Kesten's problem for F2. Here X,Y are fid 
arcsine random variables. 

Step One. The Cauchy transform Gx{z) ~ Gy{z) is the output of our last 
application of the algorithm, namely 

Vz2 - 4 

Step Two. Solve the first Voiculescu functional equation to obtain 



Vx{w) = Vriw) ^ ^ - + 2w-2w^ + .... 

w w 

Step Three. Switch to i?-transforms, add, switch back to get the Voiculescu 
transform oi X + Y, 



, , 2Vl + 4w2 - 1 1 ^ . 
z w 

Step Four. Solve the second Voiculescu functional equation to obtain 



, , -z + 2Vz2 - 12 1 4 28 232 2092 
GxMz) = =z + ^ + ^ + — + ^ 

We can now calculate the loop generating function for F2, 



1^ ,1, -I + 2VI-I2 



2 



U^z) = -Gx+y(-) - ^ ' :"~ ^ 1 + 4z2 + 28z'* + 2322^ + 2092z« + . . . . 

z z 1 — loz^ 

More generally, we can run through the above steps for general d to obtain the loop 
generating function 



, -(rf - 1) + rf^l -4(2d- 

for the free group F^, d > 2, which in turn leads to the probability generating 
function 



Applying standard methods from analytic combinatorics |12j , this expression leads 
to the asymptotics 
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Pd{n) ~ constd • n 




for the return probability of the simple random walk on F^, d > 2. From this we 
can conclude that the simple random walk on is transient for all d > 2, and 
indeed that is non-amenable for all d > 2. 

2.7. Spectral measures and free convolution. Voiculescu's algorithm outputs 
a function Gx+y [z) which encodes the moments of the sum of two freely indepen- 
dent random variables X and Y. As input, it requires a pair of compactly supported 
real measures nx , Hy which act as distributions for X and Y in the sense that 

t[X"] = J t^nxidt), r[F"] = J i"/xr(di). 

R R 

In our applications of Voiculescu's algorithm we were able to find such measures 
by inspection. Nevertheless, it is of theoretical and psychological importance to 
determine sufficient conditions guaranteeing the existence of measures with the 
required properties. 

If X : O — )• C is a random variable defined on a Kolmogorov triple {fl,T, P), its 
distribution /xx is the pushforward of P by X, 

l,x{B) = {X.P){B)=P{X-'iB)) 

for any Borel (or Lebsegue) set B C C. One has the general change of variables 
formula 

nfix)] = I f{z)iixidz) 

c 

for any reasonable / : C — >■ C. If X is essentially bounded and real-valued, Hx 
is compactly supported in M. As a random variable X living in an abstract non- 
commutative probability space {A, r) is not a function, one must obtain /xx by 
some other means. 

The existence of distributions is too much to expect within the framework of a 
non-commtative probability space, which is a purely algebraic object. We need to 
inject some analytic structure into (-4, r). This is achieved by upgrading ^ to a 
*-algebra, i.e. a complex algebra equipped with a map * : A satisfying 

{X*)*=X, {aX + I3Y)* =aX* +PY*, {XY)* =Y*X*. 

This map, which is an abstraction of complex conjugation, is required to be com- 
patible with the expectation r in the sense that 

r[X*]=7[X]- 

A non-commutative probability space equipped with this extra structure is called 
a non-commutative *-probability space. 

In the framework of a *-probability space we can single out a class of random 
variables analogous to real random variables in classical probability. These are the 



42 



JONATHAN NOVAK WITH ILLUSTRATIONS BY MICHAEL LACROIX 



fixed points oi *, X* ~ X . A random variable with this property is called self- 
adjoint. Self-adjoint random variables have real expected values, t[X] = = 
t[X], and more generally t[/(X)] S M for any polynomial / with real coefficients. 

The identification of bounded random variables requires one more upgrade. 
Given a ^-probability space {A, r) , we can introduce a Hermitian form B : Ax A 
C defined by 

B{X,Y) = t[XY*]. 

If we require that t has the positivity property t[XX*] > for all X E A, then we 
obtain a semi-norm 

\\X\\=B{X,X)^/' 
on A, and we can access the Cauchy-Schwarz inequality 

\BiX,Y)\<\\X\\\\Y\\. 
Once we have Cauchy-Schwarz, we can prove the monotonicity inequalities 

|T[X]|<|r[X2]|V2<|^[X4]|l/4 

\t[X^]\ < |r[X4]|i/'* < |r[X6]|i/6 
\t[X^]\ < \t[X^]\^/^ < \t[XY/^ 



from which the chain of inequalities 

\t[X]\ < |r[X2]|i/2 < \t[X^]\^/^ < \t[X^]\^/^ < \t[X^]\^/^ <... 
can be extracted. From this we conclude that the limit 

p{X) := lim 

exists in IR>o U {oo}. This limit is called the spectral radius of X. A random 
variable X £ A is said to be bounded if its spectral radius is finite, p{X) < oo. 

In the framework of a non-commutative ^-probability space {A, t) with non- 
negative expectation, bounded self-adjoint random variables play the role of essen- 
tially bounded real-valued random variables in classical probability theory. With 
some work, one may deduce from the Riesz representation theorem that to each 
bounded self-adjoint X corresponds a unique Borel measure px supported in [— p{X) , p{X)] 
such that 

r[/(X)]= I fit)fixidt) 

R 

for all polynomial functions / : C C. The details of this argument, in which a 
reverse-engineered Cauchy transform plays the key role, are given in Tao's notes 
[41]. The measure px is often called the spectral measure of X, but we will re- 
fer to it as the distribution of X. There is also a converse to this result: given 
any compactly supported measure p on E, there exists a bounded self-adjoint ran- 
dom variable X living in some non-commutative ^-probability space {A, r) whose 
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distribution is /i. Consequently, given two compactly supported real probability 
measures fx, v we may define a new measure [i^v s& "the distribution of the ran- 
dom variable X -\-Y ^ where X and Y are freely independent bounded self-adjoint 
random variables with distributions /i and respectively." Since the sum of two 
bounded self-adjoint random variables is again bounded self-adjoint, /i ffl j/ is an- 
other compactly supported real probability measure. Moreover, v does not 
depend on the particular random variables chosen to realize \i and v. Thus we get 
a bona fide binary operation ffl on the set of compactly supported real measures, 
which is known as the additive free convolution. For example, we computed above 
that 



Bernoulli ffl Bernoulli — Arcsine. 
The additive free convolution of measures is induced by the addition of free 
random variables. As such, it is the free analogue of the classical convolution of 
measures induced by the addition of classically independent random variables. Like 
classical convolution, free convolution can be defined for unbounded measures, but 
this requires more work [2]. 

2.8. Free Poisson limit theorem. Select positive real numbers A and a. Con- 
sider the measure 

MJV = (1 - ^)'5o + 

which consists of an atom of mass 1 — ;^ placed at zero and an atom of mass 
placed at a. For X sufficiently large, /xat is a probability measure. Its moment 
sequence is 

mnifJ-N) " ^ 1- 

The iV-fold classical convolution of /iat with itself, 

*N 

l^-N = /^AT * • • • * A'AT, 

V ' 

N 

converges weakly to the Poisson measure of rate A and jump size a as — > oo. This 
is a classical limit theorem in probability known as the Poisson Limit Theorem, or 
the Law of Rare Events. 

Let us obtain a free analogue of the Poisson Limit Theorem. This should be a 
limit law for the iterated free convolution 

= ffl • ■ ■ ffl ^J.N ■ 

v ' 

N 

From the free moment-cumulant formula, we obtain the estimate 
Since free cumulants linearize free convolution, we have 
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Thus 

lim KnipN) = Aa", 

N^OQ 

and it remains to determine the measure /i with this free cumulant sequence. The 
Voiculescu transform of fi is 



V^{w) = - + V Aa"+iu;" - - + - 
w ^-^ w 1 



Aa 



w i — aw 

n=0 



so the second Voiculescu functional equation V^(G^(z)) = z yields 

1 Aa 



G^{z) l-aG^(z) 

This equation has two solutions, and the one which behaves like 1/z for |z| — >■ cx) 
is the Cauchy transform of /i. We obtain 



^ , , z + a(l - A) - ^(z-a(l + A))2-4Aa2 

2^^" 

Applying Stieltjes inversion, we find that the density of fi is given by 



where 



(1 - \)6q + Xm{t)dt, < A < 1 
m(t)dt, A > 1 



1 



m(t) = V4Aa2-(t-a(l + A))2. 

ZTTat 

This measure is known as the Marchenko-Pastur distribution after the Ukrainian 
mathematical physicists Vladimir Marchenko and Leonid Pastur, who discovered it 
in their study of the asymptotic eigenvalue distribution of a certain class of random 
matrices. 

2.9. Semicircle flow. Given r > 0, let Hr be the semicircular measure of radius 
r. 



Taking r = 2 yields the standard semicircular distribution. Let /i be an arbitrary 
compactly supported probability measure on R. The function 

: {positive real numbers} — >■ {compactly supported real measures} 
defined by 

is called the semicircle flow. The semicircle flow has very interesting dynamics: in 
one of his earliest articles on free random variables [33] ; Voiculescu showed that 

dG{r,z) , dG{r,z) _ 
__ + G(r,z)^— -0, 
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where G{r, z) is the Cauchy transform of ffj,{r) ~ /iffl/i,-. Thus the free analogue of 
the heat equation is the complex inviscid Burgers equation. For a detailed analysis 
of the semicircle flow, see 0. 

3. Lecture Three: Modelling the Free World 

Free random variables are of interest for many reasons. First and foremost, 
Voiculescu's free probability theory is an intrinsically appealing subject worthy of 
study from a purely esthetic point of view. Adding to this are the many remarkable 
connections between free probability and other parts of mathematics, including 
operator algebras, representation theory, and random matrix theory. This lecture 
is an exposition of Voiculescu's discovery that random matrices provide asymptotic 
models of free random variables. We follow the treatment of Nica and Speicher 



3.1. Algebraic model of a free arcsine pair. In Lecture Two we gave a group- 
theoretic construction of a pair of free random variables each of which has an arcsine 
distribution. In this example, the algebra of random variables is the group algebra 
A = A[F2] of the free group on two generators A,B, and the expectation r is the 
coefficient-of-id functional. We saw that the random variables 

X^A + A-\ Y = B + B-^ 
are freely independent, and each has an arcsine distribution; 



t[a"] = T[y"] = 



0, if n odd 

if n even 



3.2. Algebraic model of a free semicircular pair. We can give a linear-algebraic 
model of a pair of free random variables each of which has a semicircular distri- 
bution. The ingredients in this construction are a complex vector space V and an 
inner product _B : V x V — )• C. Our random variables will be endomorphisms of the 
tensor algebra over V, 

oo 

^(V)=0V«", 

ri=0 

which physicists and operator algebraists call the full Fock space over V after the 
Russian physicist Vladimir Fock. We view the zeroth tensor power V^^ as the 
line in V spanned by a distinguished unit vector V0 called the vacuum vector. Let 
A = End^{V). This is a unital algebra, with unit the identity operator / : ^{V) — > 
To make A into a non-commutative probability space we need an expectation. 
We get an expectation by lifting the inner product on V to the inner product 
S{B) : S'(V) X ;?(V) ^ C defined by 

g'(_B)(vi (g) • • • «) v„, wi (g) • • • ® w„) = (S„„S(vi, wi) . . .B(v„, w„). 

Note that this inner product makes A — End^{B) into a *-algebra: for each X E A, 
X* is that linear operator for which the equation 



d{B}{Xs,t)^d{B){s,X*t) 
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holds true for every pair of tensors s, t G 5^(V) . The expectation on A is the hnear 
functional t : A ^ C defined by 

This functional is called vacuum expectation. It is unital because 

t[/] =;?(B)(/V0,V0) = B(V0,V0) = 1. 

Thus {A, t) is a non-commutative ^-probability space. 

To construct a semicircular element in {A, r), notice that to every non-zero vector 
V € V is naturally associated a pair of linear operators i?v, : d{y) — ^ whose 
action on decomposable tensors is defined by tensoring, 

i?v(v0) = V 

i?v(vi (gi • • • (8 v„) = V ig) Vi ig) • • • ig) v„, n > 1, 
and insertion-contraction 



Lv(v0) = 

Lv(vi) = B(vi, V)V0 

iv(vi (8) V2 ® • • • ® v„) = i?(vi, v)v2 ® • • • (X) v„, n>2. 

Since maps V®" y^n+i £qj, gg^^j^ n > 0, it is called the raising (or creation) 
operator associated to v. Since maps V®" ysm-i fgj. g^j.j^ n > 1 and kills 
the vacuum, it is called the lowering (or annihilation) operator associated to v. We 
have = L^, and also 

LA = B{\N,v)I 

for any vectors v, w G V. 

Let V e V be a unit vector, -B(v,v) = 1, and consider the self-adjoint random 
variable 

We claim that X^ has a semicircular distribution: 



m„(X,) = t[X: 
To see this, we expand 



0, if n odd 
Catii , if n even 



r[X;]=r[(X, + n)"]= ^ r[W], 

we{L.,-R„}" 

where the summation is over all words of length n in the operators L^,R^. Only 
a very small fraction of these words have non-zero vacuum expectation. Using the 
relation L^R^ = / to remove occurrences of the substring L^R^, we see that any 
such word can be placed in normally ordered form 



W = i?v • • • Rv . . . 
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with a + b < n. Since the lowering operator kills the vacuum vector, the vacuum 
expectation of W can only be non-zero if & = 0. On the other hand, since V®" is 
5(-B)-orthogonal to for a > 0, we must also have a = to obtain a non-zero 
contribution. Thus the only words which contribute to the above sum are those 
whose normally ordered form is that of the identity operator. If we replace each 
occurrence of in W with a +1 and each occurrence of in W with a — 1, 
the condition that W reduces to / becomes the condition that the corresponding 
bitstring has total sum zero and all partial sums non-negative. There are no such 
bitstrings for n odd, and as we saw in Lecture One when n is even the required 
bitstrings are counted by the Catalan number Cat„/2. 

Now let Vi and V2 be S-orthogonal vector subspaccs of V, each of dimension 
at least one, and choose unit vectors x e Vi,y e V2. According to the above 
construction, the random variables 

are semicircular. In fact, they are freely independent. To prove this, we must 
demonstrate that 

T[MX)g^{Y)...fk{X)gk{Y)]=0 
whenever fi,gi, ■ ■ ■ , fk,9k are polynomials such that 

TihiX)] = T[<7i(r)] = • • • = t[MX)] = T[gkiY)] = 0. 

This hypothesis means that fi{X) = fi{Ly, + i?x) is a polynomial in Lx, Rx none of 
whose terms are words which normally order to /, and similarly gj{Y) = gj{Ly + Ry) 
is a polynomial in Ly, Ry none of whose terms are words which normally order to 
/. Consequently, the alternating product 

hiX)gi{Y) . . . MX)gk{Y) 

is a polynomial in the operators Lx,Rx,Ly,Ry whose terms are words W of the 
form 

with a word in L^, R^ which does not normally order to / and a word in 
Ly, Ry which does not normally order to /. Thus the only way that W can have 
a non-zero vacuum expectation is if we can use the relations L^Ry = B{y,x)I and 
LyRx = B{x, y)7 to normally order W as 

B(x,y)™i?(y,x)"J 

with m, n non-negative integers at least one of which is positive. But, since x, y are 
B-orthogonal, this is the zero element of A, which has vacuum expectation zero. 

3.3. Algebraic versus asymptotic models. We have constructed algebraic mod- 
els for a free arcsine pair and a free semicircular pair. Perhaps these should be 
called examples rather than models, since the term model connotes some degree of 
imprecision or ambiguity and algebra is a subject which allows neither. 
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Suppose that X, Y are free random variables living in an abstract non-commutative 
probability space {A,t). An approximate model for this pair will consist of a se- 
quence {An,tn) of concrete or canonical non-commutative probability spaces to- 
gether with a sequence of pairs Xn,Yn of random variables from these spaces such 
that Xj\f models X and Yn models Y, i.e. 

T[fiX)] ^ hm t[/(X^)], T[giY)] = lim T[g{YN)] 

for any polynomials f,g, and such that free independence holds in the large N 
limit, i.e. 

lim T[h{XN)giiYN) . . . .fk{XN)gk(YN)] = 
whenever fi, gi, . . . , fk, gk are polynomials such that 

lim Tjv[/i(^jv)] = lim rjv [51 (5^iv )] = ••• = lim Tjv[/fe(Xjv)] = lim rjv[5fe(5^iv)] = 0. 

JV— >cx) JV— >oo JV— >cx) 

The question of which non-commutative probability spaces are considered con- 
crete or canonical, and could therefore serve as potential models, is subjective and 
determined by individual experience. Three examples of concrete non-commutative 
probability spaces are: 

Group probability spaces: (A, r) consists of the group algebra A = A[G] 
of a group G, and r is the coefEcient-of-identity expectation. This non- 
commutative probability space is commutative if and only if G is abelian. 
Classical probability spaces: (A, t) consists of the algebra of complex ran- 
dom variables A = L°°-{n,T,P) = f]'^^^LP{n,T,P) defined on a Kol- 
mogorov triple which have finite absolute moments of all orders, and r is the 
classical expectation t[X] = E[X]. Classical probability spaces are always 
commutative. 

Matrix probability spaces: {A,t) consists of the algebra A = A^Ar(C) of 
N X N complex matrices X = [X(ij)], and expectation is the normalized 
trace, t[X] ~ tT^lX] — ^(.'^^)+'"^^(.^^) _ This non-commutative probabil- 
ity space is commutative if and only if = 1. 
The first class of model non-commutative probability spaces, group probabil- 
ity spaces, is algebraic and we are trying to move away from algebraic examples. 
The second model class, classical probability spaces, has genuine randomness but 
is commutative. The third model class, matrix probability spaces, has a parameter 
N that can be pushed to infinity but has no randomness. By combining classi- 
cal probability spaces and matrix probability spaces we arrive at a class of model 
non-commutative probability spaces which incorporate both randomness and a pa- 
rameter which can be made large. Thus wc are led to consider random matrices. 

The space oi NxN complex random matrices is the non-commutative probability 
space {An,tn) = {L°°~{^,J^,P) Mn{C),^ <X) trjv). A random variable Xjy in 
this space may be viewed a.s a,n N x N matrix whose entries XN{ij) belong to 
the algebra L°°~{il,J^,P). The expectation rjv[-^jv] is the expected value of the 
normalized trace: 



Tn[Xn] = {E(g)tlN)[XN]='K 



XN{n) + ---+XN{NN) 
N 
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We have already seen indications of a connection between free probability and 
random matrices. The fact that Wigner's semicircle law assumes the role of the 
Gaussian distribution in free probability signals a connection between these sub- 
jects. Another example is the occurrence of the Marchenko-Pastur distribution in 
the free version of the Poisson limit theorem — this distribution is well-known in 
random matrix theory in connection with the asymptotic eigenvalue distribution of 
Wishart matrices. In Lecture One, we were led to free independence when we tried 
to solve a counting problem associated to graphs drawn in the plane. The use of 
random matrices to enumerate planar graphs has been a subject of much interest in 
mathematical physics since the seminal work of Edouard Brezin, Claude Itzykson, 
Giorgio Parisi and Jean-Bernard Zuber [5], which built on insights of Gerardus 't 
Hooft. Then, when we examined the dynamics of the semicircle flow, we found 
that the free analogue of the heat equation is the complex Burgers equation. This 
partial differential equation actually appeared in Voiculescu's work [3^ before it 
emerged in random matrix theory |22| and the discrete analogue of random matrix 
theory, the dimer model ■21 . 

In the remainder of these notes, we will model a pair of free random vari- 
ables X, Y living in an abstract non-commutative probability space using sequences 
Xn, Yn of random matrices living in random matrix space. This is first carried out 
in the important special case where X, Y are semicircular random variables, then 
adapted to allow Y to have arbitrary distribution while X remains semicircular, 
and finally relaxed to allow X, Y to have arbitrary specified distributions. The ran- 
dom matrix models of free random variables which we describe below were used by 
Voiculescu in order to resolve several previously intractable problems in the theory 
of von Neumann algebras, see [241 144] for more information. Random matrix models 
which approximate free random variables in a stronger sense than that described 
here were subsequently used by Uffe Haagerup and Steen Thorbj0rnsen T5| to re- 
solve another operator algebras conjecture, this time concerning the Ext-invariant 
of the reduced C*-algebra of F2. An important feature of the connection between 
free probability and random matrices is that it can sometimes be inverted to obtain 
information about random matrices using the free calculus. For each of the three 
matrix models constructed we give an example of this type. 

3.4. Random matrix model of a free semicircular pair. In this subsection 
we construct a random matrix model for a free semicircular pair X, Y. 

In Lecture One, we briefly discussed Wigner matrices. A real Wigner matrix is a 
symmetric matrix whose entries are centred real random variables which are inde- 
pendent up to the symmetry constraint. A complex Wigner matrix is a Hermitian 
matrix whose entries are centred complex random variables which are independent 
up to the complex symmetry constraint. Our matrix model for a free semicircular 
pair will be built out of complex Wigner matrices of a very special type: they will 
be GUE random matrices. 

To construct a GUE random matrix Xn, we start with a Ginibre matrix Z^. Let 
(f2, T, P) be a Kolmogorov triple. The N"^ matrix elements ZN{ij) € L°°~{fl,T, P) 
of a Ginibre matrix are iid complex Gaussian random variables of mean zero and 
variance 1/N. Thus is a random variable in the non-commutative probability 
space {AntTn) ~ {L°°~ {V,,J^,P) Mn{'C),E tiN)- The symmetrized random 
matrix X^ = \{Zn + Z'^) is again a member of random matrix space. The joint 
distribution of the eigenvalues of Xjq can be explicitly computed, and is given by 
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P(AAr(l)e/i, 



,AAr(A^) G In) oc 




II 



In 



for any intervals Ii , . 



, /jv ^ K., where H is the log-gas Hamiltonian [T3] 



n{Xi,. 




1 

iV2 



l<i5.ij<A' 



^ log|A, -Aj|. 



The random point process on the real line driven by this Hamiltonian is known as 
the Gaussian Unitary Ensemble, and Xpf is termed a GUE random matrix. GUE 
random matrices sit at the nexus of the two principal strains of complex random 
matrix theory: they are simultaneously Hermitian Wigner matrices and unitarily 
invariant matrices. The latter condition means that the distribution of a GUE ma- 
trix in the space oi N x N Hermitian matrices is invariant under conjugation by 
unitary matrices. The spectral statistics of a GUE random matrix can be computed 
in gory detail from knowledge of the joint distribution of eigenvalues, and virtually 
any question can be answered. The universality programme in random matrix the- 
ory seeks to show that, in the limit N ^ oo and under mild hypotheses, Hermitian 
Wigner matrices as well as unitarily invariant Hermitian matrices exhibit the same 
spectral statistics as GUE matrices. 

Given the central role of the GUE in random matrix theory, it is fitting that our 
matrix model for a free semicircular pair is built from a pair of independent GUE 
matrices. The first step in proving this is to show that a single GUE matrix Xn 
in random matrix space {An,tn) is an asymptotic model for a single semicircular 
random variable X living in an abstract non-commutative probability space (A, r) . 
In other words, we need to prove that 



In order to establish this, we will not need access to the eigenvalues of Xpf. Rather, 
we work with the correlation functions of its entries. 

Let Xjv — [XN{ij)] be a GUE random matrix. Mixed moments of the random 
variables X^lij), i.e. expectations of the form 



'- fc=i 

where i,j are functions [n] [N], are called correlation functions. All correlations 
may be computed in terms of pair correlations (i.e. covariances) 



using a convenient combinatorial formula known as Wick's formula. This formula, 
named for the Italian physicist Gian-Carlo Wick, is yet another manifestation of 
the moment-cumulant/exponential formulas. It asserts that 




n 



E nx^mjik)) 



E[XN{ij)XN{kl)] = E[XN{i3)XN{lk)] 
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■ k=l -' 7reP2(n) {r.sjeTT 

for any integer n > 1 and functions i,j : [n] — > [N]. The sum on the right hand 
side is taken over ah pair partitions of [n], and the product is over the blocks of tt. 
For example, 

E[X^(*(l)j(l))XAr(*(2)j(2))XA,(z(3)j-(3))] - 
since there are no pairings on three points, whereas 

E[XA.(z(l)j(l))^iv(*(2)j(2))X^(*(3)j(3))XA,(z(4)j(4))] 

=E[XA,(z(l)j(l))^iv(*(2)j(2))]E[Xjv(*(3)j(3))XAr(»(4)j(4))] 

+E[XA,(*(l)j(l))^A'(*(3)j-(3))]E[^^(*(2)j(2))XAr(i(4)j(4))] 

+E[X^(z(l)j(l))Xiv(*(4)j(4))]E[X^(*(2)j-(2))XAr(*(3)j-(3))], 

corresponding to the three pair partitions {1,2}U{3,4},{1,3}U{2,4},{1,4}U{2,3} 
of [4]. The Wick formula is a special feature of Gaussian random variables which, 
ultimately, is a consequence of the moment formula 

7reP2(n) 

for a single standard real Gaussian X which we proved in Lecture One. A proof of 
the Wick formula may be found in Alexandre Zvonkin's expository article |46) . 

We now compute the moments of the trace of a GUE matrix X^ using the Wick 
formula, and then take the N ^ oo limit. We have 

rN[X^] = ^ E mNmm)X^{i{2)z{3))) . . . X^(*(n)z(l))] 



t:[n]^[N] 

N ^ 



I]^A'(*(fc)»7(fc)) 



fc=l 



where 7 = (1 2 ... n) is the full forward cycle in the symmetric group S(n). Let 
us apply the Wick formula to each term of this sum, and then use the covariance 
structure of the matrix elements. We obtain 



E J|Xjv(i(/c)n(fc)) = E n IE[XJv(^(r)^7(r))XJv(^(s)^7(s))] 

fc=l 7reP2(n) {r,s}eTr 

Now, any pair partition of [n] can be viewed as a product of disjoint two-cycles 
in S(rt). For example, the three pair partitions of [4] enumerated above may be 
viewed as the fixed point free involutions 



(1 2)(3 4), (1 3)(2 4), (1 4)(2 3) 
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in S(4). This is a useful shift in perspective because partitions are inert com- 
binatorial objects whereas permutations are functions which act on points. Our 
computation above may thus be re-written as 



E 



fc=i 



i{k)i'y7T{k) • 

7reP2(n)A;^l 



Putting this all together and changing order of summation, we obtain 

n 

TN[X]i;] = N^~2 ^ ^ Jl (5i(fc)j^,r(/c) 

i:[n]-^lN] 7rGP2(n) k=l 
n 

TrGP2(n) i:ln]^[N] k=l 

from which we see that the internal sum is non-zero if and only if the function 
j : [n] — >■ [N] is constant on the cycles of the permutation 77r S S(n). In order to 
build such a function, we must specify one of N possible values to be taken on each 
cycle. We thus obtain 

7reP2(n) 

where c(cr) denotes the number of cycles in the disjoint cycle decomposition of a 
permutation a S S(n). For example, when n — 3 we have r„[X^] = since there 
are no fixed point free involutions in S(3). In order to compute tjv[^^], we first 
compute the product of 7 with all fixed point free involutions in S(4), 

(1 2 3 4)(1 2)(3 4) ^ (1 3)(2)(4) 
(1 2 3 4)(1 3)(2 4) = (1 4 3 2) 
(1 2 3 4)(1 4)(2 3) = (2 4)(1)(3), 

and from this we obtain 

1 



r4 ■ 

I /. —I— 

More generally, rjv \X'^\ = whenever n is odd since there are no pairings on an 
odd number of points. When 71 = 2fc is even the product 77r has the form 



7^ = (1 2 ... 2fc)(si ii)(s2 <2)...(Sfe tk). 

In this product, each transposition factor (s^ ti) acts either as a "cut" or as a "join" , 
meaning that it may either cut a cycle of (1 2 ... 2fc)(si t\) . . . (si_i i^-i) in two, 
or join two disjoint cycles together into one. More geometrically, we can view the 
product 77r as a walk of length k on the (right) Cayley graph of S(2A:); this walk 
is non-backtracking and each step taken augments the distance from the identity 
permutation by ±1, see Figure [T51 

A cut (step towards the identity) occurs when Si and ti reside on the same cycle 
in the disjoint cycle decomposition of (1 2 ... 2A;)(si t\) . . .(si-\ U-i), while a 
join (step away from the identity) occurs when Si and U are on different cycles. In 
general, the number of cycles in the product will be 




Figure 15. Walks corresponding to the products 771 in S(4). 



c(77r) = 1 + T^cuts — :^joins, 

so c(77r) is maximal at c(77r) = 1 + fc when it is acted on by a sequence of k cut 
transpositions. In this case we get a contribution of ]\j'^+k-i-k _ ^0 t[X^]. In 
fact, we always have 

#cuts — #joins = k — 2g 
for some non-negative integer leading to a contribution of the form N~'^^ and 
resulting in the formula 

2fel _ SgC^k) 



^ N^g 

g>0 



where eg{2k) is the number of products jtt of the long cycle with a fixed point free 
involution in S{2k) which terminate at a point of the sphere dB (id, 2/c — 1 — 2g) . We 
are only interested in the first term of this expansion, eo(2fc), which counts fixed 
point free involutions in S(2fc) entirely composed of cuts. It is not difficult to see 
that (si ti) . . . (sfe tk) is a sequence of cuts for 7 if and only if it corresponds to a 
non-crossing pair partition of [2k], and as we know the number of these is Cat^. 
We have now shown that 

lim MX^] = hm (E ® ty^)[X^] = f 
N^oo jv-s-oo ICat:|, uneven 

for a GUE matrix X^. This establishes that X^ is an asymptotic random matrix 
model of a single semicircular random variable X. It remains to use this fact to 
construct a sequence of pairs of random matrices which model a pair X, Y of freely 
independent semicircular random variables. 
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What should wc be looking for? Let X^'^\X^^'^ be a pair of free semicircular 
random variables. Let e : [n] — >■ [2] be a function, and apply the free moment- 
cumulant formula to the corresponding mixed moment: 

7reNC(n) Ben 
7rGNC2(n) {r,5}G7r 

This reduction occurs because arc free, so that all mixed free eumulants 

in these variables vanish. Moreover, these variables are semicircular so only order 
two pure eumulants survive. We can think of the function e as a bicolouring of [n] . 
The formula for mixed moments of a semicircular pair then becomes 

t[X(^«)...X(«("»]= Y1 1' 

7reN4*>(n) 

where tt G NC2*^(n) is the set of non-crossing pair partitions of [n] which pair ele- 
ments of the same colour. This is very much like the Wick formula for Gaussian ex- 
pectations, but with Gaussians replaced by semicirculars and summation restricted 
to non-crossing pairings. We need to realize this structure in the combinatorics of 
GUE random matrices. 

This construction goes as follows. Let Z^\ij), 1 < e < 2, 1 < i,j < iV be a 
collection of 2N'^ iid centred complex Gaussian random variables of variance 1/N. 
Form the corresponding Ginibre matrices Z^-* — [Z^j^\ij)], Z^^ = [Z^\ij)] and 

GUE matrices X^'^ = + (Z^j^^)*), X^j^^ = + (^^'^)*). The resulting 

covariance structure of matrix elements is 

E[X^^\ij)X^\kl)] = E[X^\ij)X^\lk)] = 

We can prove that X'i^\x^^^ are asymptotically free by showing that 

lim r^[x(;«)...x(;("»] = |NC(^)(n)|, 

JV— >-cx) 

and this can in turn be proved using the Wick formula and the above covariance 
structure. Computations almost exactly like those appearing in the one-matrix case 
lead to the formula 

with the summation being taken over the set P2\n) of pairings on [n] which re- 
spect the colouring e : [n] — > [2]. Arguing as above, each such pairing makes a 
contribution of the form N^'^^ for some g > 0, and those which make contributions 
on the leading order iV*^ correspond to sequences of cut transpositions for the full 
forward cycle tt, which we know come from non-crossing pairings. So in the limit 
N ^ oo this expectation converges to |NC2*'(n)|, as required. 
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3.5. Random matrix model of a free pair with one semicircle. In the previ- 
ous subsection we modelled a free pair of semicircular random variables X, Y living 
in an abstract non-commutative probability space {A, r) using a sequence of inde- 
pendent GUE random matrices X^, living in random matrix space {An,tn)- 

It is reasonable to wonder whether wc have; not overlooked the possibility of 
modelling X,Y in & simpler way, namely using deterministic matrices. Indeed, we 
have 

with 

ZTT 

the Wigner semicircle measure, and this fact leads to a deterministic matrix model 
for X. For each N > 1, define the N*^ classical locations iAr(l) < ijv(2) < • • • < 
Ln{N) of nx implicitly by 

J Mx(dt) = ^. 

-2 

That is, we start at t = — 2 and integrate along the semicircle until a mass of 

i/N is achieved, at which time we mark off the corresponding location L^i^i) on 
the t-axis. The measure /i^r which places mass 1/N at each of the N*^^ classical 
locations converges weakly to as N ^ oo. Consequently, the diagonal matrix 
Xn with entries XN(ij) = SijE^ii) is a random variable in deterministic matrix 
space (AlAr(C),trAr) which models X, 

lim trjv[X]^] =t[J!:"]. 

AT— >oo 

Since X and Y are equidistributcd, putting Yn '■= X^ wc have that Xn models 
X and Fjv models Y. However, X^ and Y^v are not asymptotically free. Indeed, 
asymptotic freeness of Xjv and Yjv would imply that 

lim tr//[Xjvyjv] = lim trjv[-'^jv] lim trjv[-'^jv] = 0, 

JV— >cx) N—^oo N—^oo 

but instead we have 

the mean squared classical locations of the Wigner measure, which is strictly pos- 
itive and increasing in A''. Thus while X^ and Yat model X and Y respectively, 
they cannot model the free relation between them. However, this does not preclude 
the possibility that a pair of free random variables can be modelled by one random 
and one deterministic matrix. 

Let X and Y be a pair of free random variables with X semicircular, and Y of 
arbitrary distribution. Let X^ be a sequence of GUE matrices modelling X, and 
suppose that Yjv is a sequence of deterministic matrices modelling Y, 
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lim trw[r;;] = r[r"]. 

A''— ^oo 

Xj\[ lives in random matrix space {An,tn) — (il, J-", P) (8) A^Ar(C), E ® trA?) 

while Yn lives in deterministic matrix space (A^Ar(C), trjv), so a priori it is mean- 
ingless to speak of the potential asymptotic free independence of and Yat. 
However, we may think of a deterministic matrix as a random matrix whose entries 
are constant random variables in _L°°^(f2, P). This corresponds to an embed- 
ding of deterministic matrix space in random matrix space satisfying t'jv|a^„(c) = 
(E (8) tr7v)|AiAr(C) = trAT. From this point of view, Ijv is a random matrix model of 
Y and we can consider the possibility that Xiy^Ypf G An are asymptotically free 
with respect to tjv. We now show that this is indeed the case. 

As in the previous subsection, we proceed by identifying the combinatorial struc- 
ture governing the target pair X, Y and then looking for this same structure in the 
N ^ oo asymptotics of Xjsf , Y/v . Our target is a pair of free random variables 
with X semicircular and Y arbitrary. Understanding their joint distribution means 
understanding the collection of mixed moments 

^[jfp(i)y;(i) ^ ^ ^ 

with n > 1 and p, q : [n] ^ {0, 1,2,...}. This amounts to understanding mixed 
moments of the form 

r[xr«(i)...xr«("'], 

since we can artificially insert copies of Y'^ = 1_4 to break up powers of X greater 
than one. We can expand this expectation using the free moment-cumulant formula 
and simplify the resulting expression using the fact that mixed cumulants in free 
random variables vanish. Further simplification results from the fact that, since X 
is semicircular, its only non- vanishing pure cumulant is K2{X) = 1. This leads to 
a formula for t[XY'i^^'> . . . Xy^")] which is straightforward but whose statement 
requires some notions which we have not covered (in particular, the complement 
of a non-crossing partition, see [28j). However, in the case where t is a tracial 
expecation, meaning that r[^_B] = r[_B^], the formula in question can be stated 
more simply as 

T[xr«(i)...xr«(")] = J2 T^j[Y''^^'> , . . . ,Y''^'''>]. 

7reNC2(n) 

Here, as in the last subsection, we think of a pair partition tt G P2(f^) as a product 
of disjoint two-cycles in the symmetric group S(n), and 7 is the full forward cycle 
(12 ... n). Given a permutation a S S(n), the expression Tcr[v4i, . . . , An] is defined 
to be the product of r extended over the cycles of a. For example, 

T(l 6 2)(4 5)(3)[^1, ^2,^3,^4,^,^6] = T[Ai^6^2]r[^4^5]r[A3]. 

This definition is kosher since r is tracial. We now have our proof strategy: we will 
prove that Xn, Yn are asymptotically free by showing that 

lim TN[XNYf'> . . . XnY^'^] = V T^,[Y^('\ . . . , y''(")]. 

N^oo ^ — ^ 

7reNC2(n) 
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The computation proceeds much as in the last section — we expand everything 
in sight and apply the Wick formula. We have 

= ^ ^E[X;v(a(l)a(2))yj«(i)(a(2)a(3)) . . .Xjv(a(2n - l)a(27i))y^(")(a(2n)a(l))], 

a 

the summation being over all functions a : [2n] — )• [A''] . Let us reparameterize each 
term of the sum with i,j : [n] — >■ [N] defined by 

(a(l), a(2), . . . , a{2n - 1), a(2n)) = (i(l), j(l), • . • , i{n),j{n)). 
Our computation so far becomes 



1 r ^ "1 ^ 

ij k=l fe=l 

Applying the Wick formula, the calculation evolves as follows: 

n 

ij 7reP2(n) {r,5}e7r fc=l 

n 

n 

=^"'"* E En^^^'^o'Wi^^w) 

= iv-i-t ^ Tr^,[y^w,...,y^(")] 
= ^ Ar'=(-^)-i-ttr.,[y^(^\...,r^(")i 

7reP2(Tl) 



As in the previous subsection, the dominant contributions to this sum are of order 
N° and come from those pair partitions tt e P2 {n) for which 0(7:7) is maximal, and 
these are the non-crossing pairings. Hence we obtain 

lim TNix^Yf^ . . . = V T^,[Y'i^^\. . . , 

JV— >oo ' 

7reNC2(n) 

as required. 

3.6. Random matrix model of an arbitrary free pair. In the last section we 

saw that a pair of free random variables can be modelled by one random and one 
deterministic matrix provided that at least one of the target variables is semicircu- 
lar. In this case, the semicircular target is modelled by a sequence of GUE random 
matrices. 

In this section we show that any pair of free random variables can be modelled 
by one random and one deterministic matrix, provided each target variable can 
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be individually modelled by a sequence of deterministic matrices. The idea is to 
randomly rotate one of the deterministic matrix models so as to create the free 
relation. 

Let X, y be a pair of free random variables living in an abstract non-commutative 
probability space {A, r) . We make no assumption on their moments. What we 
assume is the existence of a pair of deterministic matrix models 

t[X"] = lim tr^rfX]^], T[y"] = lim tvN[Y^]. 

If X, Y happen to have distributions /xx , Mr which are compactly supported prob- 
ability measures on M, then such models can always be constructed. In particular, 
this will be the case if X, Y are bounded self-adjoint random variables living in a 
^-probability space. 

As in the previous subsection, we view Xjv, Ym as random matrices with constant 
entries so that they reside in random matrix space {AniTn)^ with the E part of 
Tjv = 1E0 trjv acting trivially. As we saw above, there is no guarantee that Xj^, Y^ 
are asymptotically free. On the other hand, we also saw that special pairs of free 
random variables can be modelled by one random and one deterministic matrix. 
Therefore it is reasonable to hope that making X^ genuinely random might lead to 
asymptotic freeness. We have to randomize X^ in such a way that its moments will 
be preserved. This can be achieved via conjugation by a unitary random matrix 
Un e An, 

Xn UnXnU^. 

The deterministic matrix X^ and its randomized version UnXnU^ have the same 
moments since 



= (E®trw)[X]^] 
= tn[X^]. 

Consequently, the sequence UnXnU^ is a random matrix model for X. 

We aim to prove that UnXnU^ and Yn are asymptotically free. Since we are 
making no assumptions on the limiting variables X, Y, we cannot verify this by 
looking for special structure in the limiting mixed moments of UnX^U'^ and Y^, 
as we did above. Instead, we must verify asymptotic freeness directly, using the 
definition: 

lim TN[h{UNXNU*j,)gi{YN) . . . fn{UNXNU*j,)gn{YN)] = 

whenever fi,gi, . . . , fn,gn are polynomials such that 



lim TN[fi{UNXNU^)] = lim T„[gi{YN)] = ■ ■ ■ = lim TN[fn{UNXNU^)] = lim T„[5„(YAr)] = 0. 

JV— s-oo JV— >oo JV— s-oo N—^oo 

Though the brute force verification of this criterion may seem an impossible task, 
we will see that it can be accomplished for a well-chosen sequence of unitary random 
matrices Un- Let us advance as far as possible before specifying Un precisely. 
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As an initial reduction, note the identity 

TN[h{UNXNU*MyN) ■ ■ ■ fn{UNXNU*r,)gn{YN)] 

= tnIUnMXnWMYn) . . . UnUXnWMYn)]. 

Since the /^'s and t^j's are polynomials and is linear, the right hand side of this 
equation may be expanded as a sum of monomial expectations, 

rN[UNh{XN)U%gi{YN) . • • UNfn{XN)U*^9n{YN)] 

= E c{pq)TN[UMXf^U*^Yf^ . . . UnX^^-^W^Y^^^-'^] 

weighted by some scalar coefficients c{pq), the sum being over functions p : [n] — > 
{0, . . . , maxdeg fi},q : [n] {0, . . . , maxdeg^j}. Each monomial expectation can 
in turn be expanded as 

TN[UNXf^U;,Yf^ . . . C/;vn'"^[/j^F^<"^] 

= 1 E nUM{a{l)a{2))X%^'\a{2)a{3)) . . . C/j^(a(4n - l)a(4n))y«(")(a(4n)a(l))] 

a 

=j^l2^pN{a{l)ai2))XPj'\a{2)a{3)) . . .UN{a{4n)a{An - l))Y^("^(a(4n)a(l))]. 

a 

Let us reparameterize the summation index a : [4n] — >■ [N] by a quadruple of 
functions : [n] — )• [A''] according to 

(a(l), a(2), a(3), a(4), . . . , a(4n - 3), a(4n - 2), a(4n - 1), a(4n)) 
=(z(l), . . . , t{n),jin),f{n),z\n)). 

Our monomial expectations then take the more streamlined form 

rr \TT -^^(1)7-^*^9(1) TT -rP(")rr* v"'?(")i 

-J r n 1 ^ 



N 



k=l 



k=l 



where as always 7 = (1 2 ... n) is the full forward cycle in the symmetric group 
S(n). In order to go any further with this calculation, we must deal with the 
correlation functions 



E 



l[UNmjmuN{if{k)j'{k)) 



fe=i 



of the matrix elements of J7jv- We would like to have an analogue of the Wick 

formula which will enable us to address these correlation functions. A formula of 
this type is known for random matrices sampled from the Haar probability measure 
on the unitary group U(A'). 
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Haar-distributed unitary matrices are the second most important class of ran- 
dom matrices after GUE matrices. Like GUE matrices, they can be construc- 
tively obtained from Ginibre matrices. Let Zn — \/NZn he a.n N x N ran- 
dom matrix whose entries Zpf(ij) are iid complex Gaussian random variables of 
mean zero and variance one. This is a renormalized version of the Ginibre ma- 
trix which we previously used to construct a GUE random matrix. The Ginibre 
matrix Zpj is almost surely non-singular. Applying the Gram-Schmidt orthonor- 
malization procedure to the columns of Zjv, we obtain a random unitary matrix 
Un whose distribution in the unitary group U(A^) is given by the Haar proba- 
bility measure. The entries [/jv(jj) are bounded random variables, so t/jv is a 
non-commutative random variable living in random matrix space (AntTn)- The 
eigenvalues AAr(l) e'''"^!) , . . . , AAr(iV) = e'^"W, < 6*^(1) < •• • < 9n{N) < 2tt 
of Un form a random point process on the unit circle with joint distribution 

P{9n{1) e /i, . . . , 9n{N) e /jv) (X J ...J e-^'«(^i--«")d0i . . . d^w 

h In 

for any intervals Ii, . . . ,In C [0, 27r], where H is the log- gas Hamiltonian [13] 

l<i^j<N 

The random point process on the unit circle driven by this Hamiltonian is known as 
the Circular Unitary Ensemble, and Un is termed a CUE random matrix. As with 
GUE random matrices, almost any question about the spectrum of CUE random 
matrices can be answered using this explicit formula, see e.g. |8j for a survey of 
many interesting results. 

We are not interested in the eigenvalues of CUE matrices, but rather in the corre- 
lation functions of their matrix elements. These can be handled using a Wick-type 
formula known as the Weingarten formula, after the American physicist Donald H. 
WcingarteiH. Like the Wick formula, the Weingarten formula is a combinatorial 
rule which reduces the computation of general correlation functions to the compu- 
tation of a special class of correlations. Unfortunately, the Weingarten formula is 
more complicated than the Wick formula. It reads: 



E 



fc=i 



'[lUNiiik)j{k))UNii'{k)f{k)) - J2 S^-■^'S3P■r^ HUNikkW Nikp-'aik)) 

p,(TgS(n) 



k=l 



Note that his formula only makes sense when N > n, and instead of a sum over 
fixed point free involutions we are faced with a double sum over all of S(n). Worse 
still, the Weingarten formula does not reduce our problem to the computation of 
pair correlators, but only to the computation of arbitrary permutation correlators 



E 



n UNikk)UNikTr{k)) 



k=l 



TT e S(n), 



■^Further information regarding Weingarten and his colleagues in the first Fermilab theory 
group may be found at http://bama.ua.edu/~lclavell/Weston/ 
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and these have a rather comphcated structure. Their computation is the subject 
of a large hterature both in physics and mathematics, a unified treatment of which 
may be found in . We delay dealing with these averages for the moment and 
press on in our calculation. 
We return to the expression 



r n "1 ^ 

= nT.^ I[UNmJ{k))UN{^\k)f{k)) l[xf\J{k)f{k})Yf\^\k)^J{k)), 

k—l k—1 

and apply the Weingarten formula. The calculation evolves as follows: 



1 r ^ ~\ ^ 

^ E E ^-.^'WE l[Ur,{kk)Ur,{kp~'a{k)) l[x^^^''\J{k)J'{k))Yf\^'{k)^J{k)) 



N 



k=l 



k=l 



1 r ^ 1 ^ 

=- ^ E l[U^{kk)UN{kp''<T{k)) Y.l[xf\j{k)jp{k))Yf\t'{k)ta'^j{k)) 



p,creS{n) 



k=l 



i\j k—l 



1 r " _ 

— E ^ llUNikk)UN{kp-'a{k)) 



p,fTGS(n) ^ k^l 
r n 



^ E ^UN[k\i)UN{kp-^a{k)) 



^'-pK^N T ■ ■ ■ T ^ N ) -^'-(J^'-fy-' N 1 ■ ■ ■ T ^ N ' 



c(p)+c(cr-S)-l 



trp(X 



AT J • ■ • I 



) ^''Cr-^j\T N 1 ■ ■ ■ J Y )■ 



At this point we are forced to deal with the permutation correlators E[J| UN{kk)U jv(fc7r(fc))] 
Perhaps the most appealing presentation of these expectations is as a power series 
in iV"-'^. It may be shown [29] that 



E 



n -1 ^ oo 

n UN{kk)UN{kn{k)) = — ^(-1) 



■ Cn,r(7r) 

TV ' 



■ k=l r=0 

for any tt e S(n), where the coefficient c„_r(7r) equals the number of factorizations 



of TT into r transpositions {si ti) G S(n), Si < ii, which have the property that 



ti< ■■■ <tr. 

This series is absolutely convergent for N > n, but divergent for N < n. This will 
not trouble us since we are looking for N oo asymptotics with n fixed. Indeed, 
let |7r| = n — c{tt) denote the distance from the identity permutation to tt in the 
Cayley graph of S(n). Then, since any permutation is either even or odd, we have 
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E 



n -1 ^ oo 

n UN{kk)UN{kn{k)) = — ^(-1) 



fc=i 



^rCn,r(7^) 



TV" ' m 

r=Q 

(-l)kl ^ C„j^|+2g(7r) 
5=0 



7Vn+|7r| I jyn+|7r|+2 ; ' 

where a(7r) = (— l)l'^lc„ |jr| (tt) is the leading asymptotics. We may now continue 
our calculation: 



p,cres( 



p,(TGS(n 

Putting everything together, we have shown that 

= E + O ('^') ')7Vl7l-|p|-|p-^'^|-k-Sl trp(/i(Xjv), . . . , MXn)) tr,-i^(.9i(r^), . . . ,5n(>V)), 

and it remains to show that the N oo limit of this complicated expression is 
zero. To this end, consider the order I7I — \p\ — |p^^cr| — |o'~-^7| of the p,a term 
in this sum. The positive part, I7I = n — 1, is simply the length of any geodesic 
joining the identity permutation to 7 in the Cayley graph of S(n). The negative 
part, — IpI — Ip^^c] — |cr~"'^7|, is the length of a walk from the identity to 7 made up 
of three legs: a geodesic from id to p, followed by a geodesic from p to a, followed 
by a geodesic from a to 7. Thus the order of the p, a term is at most N'^, and this 
occurs precisely when p and a lie on a geodesic from id to 7, see Figure [TBI Thus 

lim TN[UNfl{XN)UMYN) . . . UNfniXN)U*Ngn{YN)] 

N^oo 

a{p-'a)Tp{h{X),...,U{X))T,-iMY),...,gn{Y)). 

|p| + |p-io-| + |<T-i7l = l7l 

Since 

t[/i(X)] = T[gi(r)] = • • • - r[UiX)] - T[5„(r)] = 0, 

in order to show that the sum on the right has all terms equal to zero it suffices 
to show that the condition \p\ + |p^^cr| + |o'^^7| = I7I forces either p or a^^j to 
have a fixed point. This is because Tp and r^-i-y are products determined by the 
cycle structure of the indexing permutation. Since p, a lie on a geodesic id — >■ 7, we 
have IpI + |cr^^7| < I7I = n — 1, so that one of p or (t^^7 is a product of at most 
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Figure 16. Only geodesic paths survive in the large N limit 



(n — l)/2 transpositions. In the extremal case, all of these transpositions are joins, 
leading to a permutation consisting of an (n — l)-cycle and a fixed point. 

3.7. GUE + GUE. Imagine that we had been enumeratively lazy in our construc- 
tion of the GUE matrix model of a free semicircular pair, and had only shown that 
two iid GUE matrices X^\x^^ are asymptotically free without determining their 
individual limiting distributions. We could then appeal to the free central limit 
theorem to obtain that the limit distribution of the random matrix 

C _ "I ^^N 

7^ ' 

where the -^^^'s are iid GUE samples, is standard semicircular. On the other hand, 

since the matrix elements of the ^^'''s are independent Guassians whose variances 
add, we see that the rescaled sum ^jv is itself an N x N GUE random matrix for 
each finite N. Thus we recover Wigner's semicircle law (for GUE matrices) from 
the free central limit theorem. 

3.8. GUE + deterministic. Let Xn he an N x N GUE random matrix. Let Yat 

be an iV X TV deterministic Hermitian matrix whose spectral measure //jy converges 
weakly to a compactly supported probability measure v. Let a be the limit distri- 
bution of the random matrix Xn + Yat- Since Xn, Yjv are asymptotically free, we 
have 



where /U is the Wigner semicircle. 
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3.9. randomly rotated + diagonal. Consider the 2N x 2N diagonal matrix 



D2N — 



-1 

whose diagonal entries are the first 2N terms of an alternating sequence of ±l's, all 
other entries being zero. Let U2N be a 2A'' x 2A'' CUE random matrix, and consider 
the random Hermitian matrix 



^2N = C^2Ar-D2Art^2JV + D2N- 

Let /U2JV denote the spectral measure of A2n- We claim that /i2jv converges weakly 
to the arcsine distribution 

M(dt) = ^^=dt, tG[-2,2], 
7rv4 — t'' 

as iV — > 00. 

Proof: Set X2N = C^2Ar-D2AfC/2Ar ^^'^ ^2Af — D2n- Then Xiq^Y^ is a random 
matrix model for a pair of free random variables X,Y each of which has the ±1- 
Bernoulli distribution 

Thus the limit distribution of their sum is 



Bernoulli ffl Bernoulli = Arcsine. 
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